next up previous
Next: Conclusion Up: Comparison of Log-Linear Models Previous: Connection between the two

Databases and Results


Table 1: Corpus statistics for the three databases used in the experiments from the UCI and STATLOG repositories, respectively.
corpus name  MONK    DNA    LETTER 
# classes 2 3 26
# features 17 180 16
# training samples  124 2 000 15 000
# test samples 432 1 186 5 000

The experiments were performed on three corpora from the UCI and STATLOG database, respectively [5,6]. The corpora were chosen to cover different properties with respect to the number of classes and features and with respect to the size. The statistics of the corpora are summarized in Table 1. MONK is an artificial decision task with categorical features also known as the monk's problem. For the experiments, the categorical features were transformed into binary features. For the DNA task, the goal is to detect gene intron/exon and exon/intron boundaries given part of a DNA sequence. Also for this task, the categorical features were transformed into binary features. Finally, the LETTER corpus consists of printed characters that were preprocessed and a variety of different features was extracted.


Table 2: Experimental results for the three databases used with different settings of the algorithms given as error rate (er) in %. The number of parameters (#param.) refers to the total number of parameters needed to completely define the classifier.
  MONK DNA LETTER
method  er[%] #param.  er[%] #param.  er[%] #param.
single Gaussian 28.5 51 9.5 720 41.6 432
log-linear, first-order 28.9 36 5.6 543 22.5 442
second-order 0.2 308 5.1 48 873 13.5 3 562
weighted dissimil., one prot. 16.7 68 6,7 1 080 24.1 832
multiple prot. 0.0 2 142 4.7 360 540 3.3 240 416
best other [5,6] 0.0 - 4.1 - 3.4 -

Table 2 shows a summary of the results obtained with the two methods. The figures show the following tendencies: Note that second-order features perform better here although estimation of full, class-specific covariance matrices is problematic for many tasks. This indicates a high robustness of the maximum entropy log-linear approach. Note further that both the one-prototype weighted dissimilarity classifier and the log-linear model with second-order features lead to quadratic decision boundaries, but the former does not take into account bilinear terms of the features, which is the case for the second-order features.

The high error rate of the log-linear model with first-order features on the MONK corpus was analyzed in more detail. As this task only contains binary features, also the one-prototype weighted dissimilarity classifier leads to linear decision boundaries here ( $ x^2=x \Leftrightarrow x\in\{0,1\}$). Therefore it is possible to infer the parameters for the log-linear model from the training result of the weighted dissimilarity classifier. This showed that the log-likelihood of the posterior (2) on the training data is lower than that resulting from maximum entropy training , which is not surprising as exactly this quantity is the training criterion for the log-linear model. But interestingly the same result holds for the test data as well. That is, the maximum entropy training result has higher prediction accuracy on the average for the class posterior, but this does not result in better classification accuracy. This may indicate that on this corpus with very few samples the weighted dissimilarity technique is able to better adapt the decision boundary as it uses a criterion derived from the minimum classification error criterion.


next up previous
Next: Conclusion Up: Comparison of Log-Linear Models Previous: Connection between the two
Daniel Keysers 2004-03-10