Homepage of Philippe Dreuw - Research - Gesture Recognition

Sign Language Recognition > Benchmark Databases

RWTH German Fingerspelling Database

Our database is freely available.
Use it as you wish, but please cite us if you do:

P. Dreuw, T. Deselaers, D. Keysers, and H. Ney. Modeling Image Variability in Appearance-Based Gesture Recognition. In ECCV Workshop on Statistical Methods in Multi-Image and Video Processing (ECCV-SMVP), pages 7-18, Graz, Austria, May 2006.

You can download our database and the used scripts here.

Please write me an email if there are any questions.

We have also further databases that can be used for sign language recognition.

In the course of my diploma thesis work Appearance-Based Gesture Recognition, a new database of fingerspelling letters of German Sign Language (Deutsche Gebärdensprache, DGS) was created. The RWTH gesture database contains 35 gestures with video sequences for the signs A to Z and SCH, the German umlauts Ä, Ö, Ü, and for the numbers 1 to 5. Five of the gestures contain inherent motion (J, Z, Ä, Ö and Ü).

The recording was done under non-uniform daylight lighting conditions, the back- ground and the camera viewpoints are not constant, and the persons had no restrictions on the clothing while gesturing.

picture of RWTH gesture database record setup

The database consists of 1400 image sequences that contain gestures of 20 different persons. Each person had to sign each gesture twice on two different days. The gestures were recorded by two different cameras, one webcam and one camcorder, from two different points of view. Figure 6.4 shows the record setup. The webcam recorded the sequences with a resolution of 320x240 at 25 frames per second, and the camcorder with a resolution of 352x288 at 25 frames per second. The persons were not trained to perform the signs, therefrom the gestures may differ from the standard. For recording the gestures we programmed a shell script which gave us the possibility of recording and converting gestures for as many persons as we wanted in a flexible and easy way. All videos were recorded in MPEG-4 DivX format using the freely available software MPlayer. The script offers possibilities to easily integrate new recording devices or changing the record resolution and frame rate.

Example images from the RWTH gesture database of the letters 'A', 'B', and 'C' showing the webcam and the camcorder viewpoint. Click on the images to view an example video clip.

Also we programmed another shell script to convert the recorded videos into single image files. For each person, session, and camera a sequence file was generated which contains all images belonging to this sequence. We chose the PNG image format with high compression factor but one may change this to any other value. These two scripts are also available online.

Before recording, the proband was asked if he agrees in making his sequence publicly available. It was clearly mentioned that he could abandon the record-session at any time. After a short explanation of the course of events he had to sign a letter of agreement. This is a very important task when recording a proband with cameras: on one hand the proband exactly knows what will happen with his records and on the other hand the proband cannot defy with hindsight to the publishing of the complete database. A more detailed overview on usability evaluation and working with probands can be found in [Nielsen 00] and [Schweibenz & Thissen 02].

For each gesture an example video was shown before recording. The proband could view this video as often as he wanted. He then started the recording by hitting the RETURN-key and stopped it by hitting it again. Then his recording was displayed to be compared with the previous reference example. The proband could record his gesture as often as he wanted. One recording-session took between 10 and 20 minutes. The different lighting conditions and sometimes the hand is located in front of the face makes it difficult to track and extract. No instructions concerning the clothing or jewellery like rings, bracelets or watches were given. We decided to record such a difficult database with respect to be able to build an online recognition system later which can work under no constraints.

Examples of the German finger-alphabet taken from the RWTH gesture database recorded with the webcam showing the letters A-Z, Ä, Ö, Ü, SCH and the numbers 1 to 5. Note that J, Z, Ä, Ö and Ü are dynamic gestures. Click on the images to view an example video clip.

Results

Using a camshift tracker on the RWTH gesture database to extract the original images thresholded by their skin probability we could improve the error from 87.1% to 44.0%. With the first time derivative image feature of original images thresholded by their skin probability in combination with tracking, the error rate could be improved from 72.1% to 46.2%. This shows the need of tracking system or a different feature extraction method to be more position and scale independent.

Using a two-sided tangent distance we could improve once again the error rate to good and currently best result of 35.7% which shows the advantage of using distance measures being invariant against affine transformations and the possibility of recognizing sign language by simple appearance-based features.

With the same features but scaled to 16x16 we achieved an error rate of 46.0% for one-sided tangent distance and 42.5% for two-sided which is even better than using 32x32 original image features without tangent-distance.

We could also improve the error rate when using the first time derivative image feature of original images thresholded by their skin probability with two-sided tangent distance from 46.2% to 44.1%.

The confusion matrix was obtained using two-sided tangent distance on the RWTH gesture database with original images thresholded by their skin probability as features. The error rate table shows all achieved results on this database up to now.

Confusion matrix with two-sided tangent distance and camshift tracking on the RWTH gesture database, error rate 35.7% (C: correct, I: incorrect)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 C/I ER A 0 8 4 2 1 1 3 1 8/12 60.0% B 1 11 4 2 1 1 1 11/9 45.0% C 2 12 3 1 3 1 12/8 40.0% D 3 1 5 1 3 1 2 1 1 3 1 1 5/15 75.0% E 4 1 17 1 1 17/3 15.0% F 5 2 12 1 1 1 2 1 12/8 40.0% G 6 17 2 1 17/3 15.0% H 7 3 16 1 16/4 20.0% I 8 2 1 2 8 5 1 1 8/12 60.0% J 9 1 1 13 1 2 1 1 13/7 35.0% K 10 17 1 1 1 17/3 15.0% L 11 1 1 1 14 1 1 1 14/6 30.0% M 12 3 10 4 1 1 1 10/10 50.0% N 13 1 3 1 3 11 1 11/9 45.0% O 14 2 1 9 1 1 5 1 9/11 55.0% P 15 1 16 2 1 16/4 20.0% Q 16 1 1 1 1 16 16/4 20.0% R 17 1 2 1 5 2 1 2 6 5/15 75.0% S 18 1 5 6 3 1 3 1 3/17 85.0% T 19 2 1 2 14 1 14/6 30.0% U 20 1 3 1 2 5 1 2 4 1 5/15 75.0% V 21 1 1 17 1 17/3 15.0% W 22 1 1 1 16 1 16/4 20.0% X 23 1 1 1 15 1 1 15/5 25.0% Y 24 0 1 2 1 1 15 15/5 25.0% Z 25 1 1 1 1 16 16/4 20.0% AE 26 2 1 1 1 1 14 14/6 30.0% OE 27 1 2 1 2 14 14/6 30.0% UE 28 1 1 1 2 2 2 10 1 10/10 50.0% SCH 29 1 1 18 18/2 10.0% Eins 30 3 1 1 15 15/5 25.0% Zwei 31 1 1 1 1 2 14 14/6 30.0% Drei 32 1 1 1 3 14 14/6 30.0% Vier 33 2 1 1 1 15 15/5 25.0% Fuenf 34 1 1 18 18/2 10.0% C 8 11 12 5 17 12 17 16 8 13 17 14 10 11 9 16 16 5 3 14 5 17 16 15 15 16 14 14 10 18 15 14 14 15 18 I 9 7 9 1 25 0 9 4 3 12 12 1 18 15 4 3 29 5 3 0 8 1 3 7 0 12 4 13 12 2 11 6 1 1 0 I% 4 3 4 0 10 0 4 2 1 5 5 0 7 6 2 1 12 2 1 0 3 0 1 3 0 5 2 5 5 1 4 2 0 0 0

Confusion matrix with two-sided tangent distance and camshift tracking on the RWTH gesture database, error rate 35.7% (C: correct, I: incorrect)

           0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34   C/I    ER
    A   0  8           4                          2        1           1                 3              1            8/12  60.0%
    B   1    11        4                       2           1  1                                      1              11/9   45.0%
    C   2       12                                   3     1                                3        1              12/8   40.0%
    D   3     1     5  1                 3  1     2        1                    1     3        1     1               5/15  75.0%
    E   4     1       17                          1                    1                                            17/3   15.0%
    F   5        2       12        1                 1                          1           2     1                 12/8   40.0%
    G   6                   17  2                          1                                                        17/3   15.0%
    H   7                    3 16                                                        1                          16/4   20.0%
    I   8  2  1        2           8  5                    1                                         1               8/12  60.0%
    J   9  1           1             13        1           2     1           1                                      13/7   35.0%
    K  10                               17              1              1                       1                    17/3   15.0%
    L  11        1              1        1 14     1        1                                1                       14/6   30.0%
    M  12              3                      10  4        1     1              1                                   10/10  50.0%
    N  13  1           3                 1     3 11                    1                                            11/9   45.0%
    O  14        2     1                             9     1                    1           5           1            9/11  55.0%
    P  15                    1                         16  2                          1                             16/4   20.0%
    Q  16  1     1              1                       1 16                                                        16/4   20.0%
    R  17                             1  2                 1  5        2     1        2        6                     5/15  75.0%
    S  18  1           5                       6  3        1     3              1                                    3/17  85.0%
    T  19                    2        1        2                   14                                1              14/6   30.0%
    U  20     1                          3                 1  2        5        1     2        4     1               5/15  75.0%
    V  21                                                  1     1       17                          1              17/3   15.0%
    W  22     1                                            1              1 16                                1     16/4   20.0%
    X  23        1  1                                      1                   15     1     1                       15/5   25.0%
    Y  24  0  1                    2  1                    1                      15                                15/5   25.0%
    Z  25                             1        1        1  1                         16                             16/4   20.0%
   AE  26  2           1                          1        1                 1          14                          14/6   30.0%
   OE  27  1     2                                1        2                               14                       14/6   30.0%
   UE  28     1                                1           1  2        2              2       10  1                 10/10  50.0%
  SCH  29                                1                 1                                     18                 18/2   10.0%
 Eins  30                             3                    1                    1                   15              15/5   25.0%
 Zwei  31                    1           1                 1                                1        2 14           14/6   30.0%
 Drei  32                    1                             1                                         1  3 14        14/6   30.0%
 Vier  33                                      2                                      1                 1  1 15     15/5   25.0%
Fuenf  34                    1                                                                       1          18  18/2   10.0%
        C  8 11 12  5 17 12 17 16  8 13 17 14 10 11  9 16 16  5  3 14  5 17 16 15 15 16 14 14 10 18 15 14 14 15 18
        I  9  7  9  1 25  0  9  4  3 12 12  1 18 15  4  3 29  5  3  0  8  1  3  7  0 12  4 13 12  2 11  6  1  1  0
       I%  4  3  4  0 10  0  4  2  1  5  5  0  7  6  2  1 12  2  1  0  3  0  1  3  0  5  2  5  5  1  4  2  0  0  0

Error rates for the RWTH gesture database with camshift tracking and different distance functions
Feature	Feature Size	Distance	ER[%]
Original thresholded by skin color prob.	32x32	Euclidan	44.0
	32x32	One-sided tangent	39.4
	32x32	Two-sided tangent	35.7
	16x16	One-sided tangent	46.0
	16x16	Two-sided tangent	42.5
first time derivative of orig. thresholded by skin color prob.	32x32	Euclidan	46.2
	32x32	Two-sided tangent	44.1

Home > Research > Sign Language Recognition > Benchmark Databases

Philippe Dreuw

Last modified: Wed Jul 18 15:39:42 CEST 2007 Disclaimer. Created Wed Dec 22 18:04:32 CET 2004