In this paper, we compare two classification algorithms that are both
discriminative. Algorithms for classification of observations
into one of the classes
usually
estimate some of their parameters in the training phase from a set of
labeled training data
,
. The training
procedure can take into account only the data from one class at a time
or all of the competing classes can be considered at the same time. In
the latter case the process is called discriminative. As
discriminative training puts more emphasis on the decision boundaries,
it often leads to better classification accuracy.
We examine the connection between two discriminative classification algorithms and compare their performance on three databases from the UCI and STATLOG repositories [5,6].
The principle of maximum entropy is a powerful framework that can be used to estimate class posterior probabilities for pattern recognition tasks. It leads to log-linear models for the class posterior and uses the log-probability of the class posterior on the training data as training criterion. It can be shown that its combination with the use of first-order feature functions is equivalent to the discriminative training of single Gaussian densities with pooled covariance matrices [4].
The use of weighted dissimilarity measures, where the weights may depend on the dimension and class and are trained according to a discriminative criterion, has shown high performance on various classification tasks [9]. Also for this method, a strong connection to the use of Gaussian densities can be observed if one prototype per class is used. For more than one prototype per class, the similarity leads to a mixture density approach. These connections to the Gaussian classifier are used to compare the two discriminative criteria.