Next: Experiments and Results
Up: Maximum Entropy and Gaussian
Previous: Maximum Entropy Modeling
Consider first-order feature functions for maximum entropy classification
where
if
, and 0 otherwise denotes the
Kronecker delta function. In the context of image recognition, we may
call the functions
appearance based image features, as
they represent the image pixel values. The duplication of the features
for each class is necessary to distinguish the hypothesized
classes. The functions
allow for a log-linear offset in the
posterior probabilities. Now, using the properties of the Kronecker
delta, the structure of the posterior
probabilities becomes
where
denotes the coefficient for the feature function
.
Now, consider a Gaussian model (3)
for
with pooled covariance
matrix
.
Using Bayes' rule, and the relation
we can rewrite the class posterior probability (note that the terms
that do not depend on the class
cancel in the fraction):
As result, we see that for unknown class priors
the resulting
model (6) is identical to the maximum entropy model
(5). We can conclude that the discriminative training
criterion (2) for the Gaussian model (3)
with pooled covariance matrices results in exactly the same functional
form as the maximum entropy model for first-order features. This
allows to use the well understood algorithms for maximum entropy
estimation to estimate the parameters of a Gaussian model
discriminatively.
If we repeat the same argument as above for the case of Gaussian
densities without pooling of the covariance matrices, we find that we
can again establish a correspondence to a maximum entropy model:
Here, the square matrix
corresponds to the negative of the
inverse of the covariance matrix
. These parameters can be
estimated using a maximum entropy model with the second-order feature
functions
One interesting consequence of using the corresponding maximum entropy
model and estimation is that we implicitly relax the constraints on
the covariance matrices to be positive (semi-) definite. Therefore,
the resulting model is not exactly equivalent to a Gaussian model.
This result is in contrast to the approach taken in [5], where
the authors derive discriminative models for Gaussian densities based
on priors of the parameters and the minimum relative entropy
principle. Their solution results in discriminatively trained weights
for the training data and therefore preserves the mentioned constraints.
Next: Experiments and Results
Up: Maximum Entropy and Gaussian
Previous: Maximum Entropy Modeling
Daniel Keysers
2002-10-15