Next: Conclusion
Up: Experiments with an Extended
Previous: Algorithms
Experimental results
In our experiments we used three different databases. We performed a number
of experiments on the widely used USPS database of handwritten digits
with 7291 training and 2007 test images and
validated the results on the (modified) database of the National Institute of
Standards and Technology (MNIST) with 60,000 resp. 10,000 images.
For comparison of performance in a different domain we also tested
the algorithms on medical image data from the IRMA project
(Image Retrieval in Medical Applications [10]),
where the objective is to assign a given radiograph to one of the six classes
abdomen, breast, chest, limbs, skull, spine (see Fig. 4). The images were scaled to a common size
of
for classification without significant decrease in
recognition rate.
Figure 4:
Example images, IRMA database
![\includegraphics [width=1.3cm,angle=0]{/u/keysers/Paper/ICPR2000/repr6.ps}](img61.png) |
Because there were
only 1617 images available, we adopted a leaving-one-out approach
for cross validation, classifying
each image while using the remaining 1616 as training set.
After parameter adjustment the classifier was evaluated on a
new set of 332 additional radiographs.
Table 1:
Results for USPS
| Distance Measure |
Error Rate [%] |
| |
1-NN |
KD |
KD,9-1 |
KD,9-9 |
| Euclidean |
5.6 |
5.5 |
4.5 |
4.2 |
| TD, SS |
3.4 |
3.3 |
3.0 |
2.8 |
| TD, DS |
3.3 |
3.0 |
2.5 |
2.4 |
Table 1 summarizes the main results of experiments with the USPS database
(the notation `a-b' indicates increased number of training samples
by factor a and increased number of test samples by factor b).
Regardless of the chosen distance measure
multiplying training and test data consistently improved
classification results.
The experiments showed that it is advisable to compute the tangents for
the test data
when computing the SSTD (1-NN performance: 3.4% vs. 3.8% for training data tangents).
We chose 1-NN here as according to [3],
was the best choice for k-NN on USPS.
Furthermore the proposed Euler-Cauchy method did not improve the overall results on this database. It produced
less errors on the data misclassified by 1-NN, but more on the
remaining data. This is due to the
fact, that it allows larger variations in the alignment of patterns, both towards correct
and incorrect reference images.
Fig. 6 visualizes this
effect showing the tolerance of the different distance measures with respect to
a horizontal displacement. One image from each class was randomly selected and
the distances to one displaced image were calculated.
Multiplying the training data with tangent
approximations or thinned versions of the images did not achieve lower error rates, while
the usage of different norms
enhanced results for the basic KD classifier,
but not for the best.
The variety of resulting classifiers invited the usage of classifier combination [7], and
the best result could thus be improved to 2.2% error rate.
Fig. 5 shows the remaining errors with their class labels.
Some of the errors appear to be label mistakes, some
are hard tasks even for a human, and some shapes do not appear in the training data, as is the
case with the three consecutive `1's in the lower row, which were classified as `7's.
Errors like the first image illustrate the limitations of distance based classifiers (there
exists a training image of class five which is almost identical).
Figure 5:
USPS errors (best result, 2.2% error)
![\includegraphics [width=8.3cm,angle=0]{/u/keysers/Paper/ICPR2000/majv15.ps}](img64.png) |
Figure 6:
Behaviour of different distance measures with respect to image shifts on USPS database
![\includegraphics [width=3.6cm,height=5.7cm,angle=270]{/u/keysers/Folien/Proposal/diag.n.ps}](img67.png) |
Table 2:
Comparison of results for OCR
| Method |
Error Rate [%] |
| |
|
USPS |
MNIST |
| Human Performance [1,3] |
2.5 |
0.2 |
| Neural Net (LeNet1/LeNet4) [3] |
4.2 |
1.1 |
| Support Vectors [11] |
3.0 |
0.8 |
| Tangent Distance [3] |
2.5 |
1.1 |
| Boosting [3] |
2.6 |
0.7 |
| This work |
TD, KD, 9/9 |
2.4 |
1.0 |
| |
TD, bagging |
2.2 |
- |
obtained with extended training set |
For comparison we repeated the experiment which obtained the best result on USPS
on the MNIST database. Table 2 shows the results in comparison to those obtained
by other groups. The asterisk indicates that the corresponding experiments were
performed using a training set extended with about 2,400 machine printed digits.
Note that in contrast to this approach we increase the effective size of the training
set but do not actually take new data.
Considering that all optimizations were done on USPS, the MNIST error rate of 1.0% is
surprisingly low, which
shows that the approach generalizes well and
the parameters were not overfitted. Since we did not repeat all experiments for MNIST,
bagging was not applied.
TD is usually applied using the seven transformations
proposed in [1] (translations (2), scaling, rotation, axis-deformations (2),
and line-thickness) where the first six account for affine variations.
We tried a number of different tangents including projective
transformations, brightness, contrast and different versions of the
thickness tangent, but were not able to improve the results of the original
tangents with USPS. On the other hand in a domain like
medical imaging, the thickness tangent loses its
a priori nature and can be replaced by a brightness tangent
(here defined as a constant function over
),
modeling different doses in x-ray imaging.
This is reflected in the corresponding recognition rates and
shows that the selection of tangents is domain dependent.
Table 3 shows the results of different distance measures for the IRMA database
using a region
of size
pixels,
`brightness'
indicating that the tangent for line-thickness was replaced by the brightness tangent (for computational reasons we
used SSTD for these experiments).
Note that we were
not able to obtain error rates lower than 29% using cooccurrence matrix based features [12].
The results show,
that in this domain thresholding is appropriate and the improvements of TD and IDM are nearly
additive. Combination of the two approaches was achieved here by replacing Euclidean
distance with distortion distance (4) in the last step of distance computation, when the optimum
is already known [12].
On the other hand IDM does not improve classification for OCR (e.g. when a line
distinguishing one numeral from the other is only one or two pixels
thick, it is
easy to erase the line completely using the IDM),
which shows the domain-dependency of the distance measures involved.
Table 3:
Results for IRMA database
| Distance Measure |
Error Rate [%] |
| |
1-NN |
KD |
KD, threshold |
| Euclidean |
18.0 |
16.4 |
14.2 |
| TD |
15.3 |
14.8 |
13.4 |
| IDM |
16.5 |
14.7 |
13.2 |
| TD, IDM |
14.7 |
13.2 |
11.7 |
| TD (brightness), IDM |
13.5 |
12.9 |
10.3 |
Using the aspect ratio
of the radiographs as additional feature, the error rate dropped further from 10.3% to 8.2%.
This could be improved marginally by using the
Euler-Cauchy distance measure not justifying the additional amount of computation
involved.
Using the best parameters for the database determined by the leaving-one-out
strategy, the algorithm misclassified 30 out of 332 (9.0%)
new radiographs (with the
training set now consisting of 1617 images) which means
that an adequate generalization was achieved.
Next: Conclusion
Up: Experiments with an Extended
Previous: Algorithms
Daniel Keysers
2000-11-16