Deep Hand:

How to Train a CNN on 1 Million Hand Images When Your Data Is Continuous and Weakly Labelled

CVPR 2016 Oral

Oscar Koller
Human Language Technology & Pattern Recognition Group
RWTH Aachen University, Germany


In Short:

This work presents a new approach to learning a frame-based classifier on weakly labelled sequence data by embedding a CNN within an iterative EM algorithm. Although we demonstratet this in the context of hand shape recognition, the approach has wider application to any video recognition task where frame level labelling is not available. We provide here the pretrained caffe models trained on 1 million articulated hand images originating from three real-life sign language data sets. The model distinguishes 60 fine-grained hand shape classes and a garbage class. Further, we are sharing 3359 challenging hand shape images that were used to evaluate the approach.

If you use our data or models, please cite our publication:


Detailed Model Description:

For our works on hand shape recognition we follow the hand shape taxonomy by the danish sign language lexicon team (Jette H. Kristoffersen and Thomas Troelsgård, Center for Tegnsprog, Denmark http://www.tegnsprog.dk), which amounts to over 60 different hand shapes, often with very subtle differences such as a flexed versus straight thumb. The employed classes and their are shown in Table 2.

Table 1: Table summarises the model provided for download.
Name Top-1 Acc [%] Top-5 Acc [%] Continuous Sign Language Recognition on RWTH-PHOENIX-Weather 2014: WER [%]
1-miohands-v1 62.8 85.6 51.6 (dev) 50.2 (test)
1-miohands-v2 85.5 94.8 48.4 (dev) 48.7 (test)

All hand shape icons employed in this document originate from the online Dictionary of New Zealand Sign Language http://nzsl.vuw.ac.nz.


Table 2: Table shows the inventory of modelled handshapes. The column `Nr' corresponds to the model index, where the first output Nr. `0' represents the garbage class (Not in Table).
Nr Shape Name Nr Shape Name
1 Image Handshape_3_1_1 1 31 Image Handshape_6_5_2 l_hook
2 Image Handshape_6_4_1 2 32 Image Handshape_5_4_1 middle
3 Image Handshape_7_1_1 3 33 Image Handshape_3_4_1 m
4 Image Handshape_7_1_3 3_hook 34 Image Handshape_6_2_4 n
5 Image Handshape_2_1_2 4 35 Image Handshape_4_2_1 o
6 Image Handshape_2_1_1 5 36 Image Handshape_5_1_1 index
7 Image Handshape_7_3_1 6 37 Image Handshape_5_1_1_1flex index_flex
8 Image Handshape_5_3_1_7 7 38 Image Handshape_5_2_1 index_hook
9 Image Handshape_8_1_3 8 39 Image Handshape_6_3_2 pincet
10 Image Handshape_3_2_1 a 40 Image Handshape_4_2_2 ital
11 Image Handshape_1_1_3 b 41 Image Handshape_1_3_1 ital_thumb
12 Image Handshape_1_1_2 b_nothumb 42 Image Handshape_1_3_1_nothumb ital_nothumb
13 Image Handshape_1_1_1 b_thumb 43 Image Handshape_1_2_2 ital_open
14 Image Handshape_6_5_1 cbaby 44 Image Handshape_6_2_2 r
15 Image Handshape_4_1_1 obaby 45 Image Handshape_3_3_1 s
16 Image Handshape_1_3_2 by 46 Image Handshape_3_5_2 write
17 Image Handshape_1_4_1 c 47 Image Handshape_1_3_2 spoon
18 Image Handshape_5_1_1_d d 48 Image Handshape_6_1_2_t t
19 Image Handshape_3_4_2 e 49 Image Handshape_6_1_1 v
20 Image Handshape_4_3_1 f 50 Image Handshape_6_1_3 v_flex
21 Image Handshape_4_3_1_open f_open 51 Image Handshape_6_1_4 v_hook
22 Image Handshape_6_6_2_flythumb fly 52 Image Handshape_7_1_4 v_thumb
23 Image Handshape_6_6_2 fly_nothumb 53 Image Handshape_7_3_2 w
24 Image Handshape_6_3_1 g 54 Image Handshape_6_6_1 y
25 Image Handshape_6_2_1 h 55 Image Handshape_2_3_1 ae
26 Image Handshape_6_2_3 h_hook 56 Image Handshape_2_3_2 ae_thumb
27 Image Handshape_7_2_1 h_thumb 57 Image Handshape_7_4_2 pincet_double
28 Image Handshape_5_3_1 i 58 Image Handshape_4_1_2 obaby_double
29 Image Handshape_8_1_1 jesus 59 Image Handshape_3_4_1 m2
30 Image Handshape_6_1_2 k 60 Image Handshape_8_1_2 jesus_thumb


Oscar Koller 2016-06-03
Creative Commons License