README-File of YASMET 1.0 ---------------------------------------- Yet Another Small MaxEnt Toolkit: YASMET ---------------------------------------- (written by Franz Josef Och; June 1st 2001) (1) GENERATE EVENT FILE An example event file is: ================================================== 3 0 # a x s v # b s q s x x # c s f w # 0 # a s s w q # b e f s 1 a # 2 c c f s a # 1 # a x s v # b s q s s x x # c s f w # 1 # a s s w q # b e s f s a # c c f m n t z s a # 2 # a x s v # b s q s x x # c s f w # 2 # a s s w q # b e f e q s a # c c f s a # 0 # a s s s w q # b e f f h s a # c c f s a # 1 # a x s s v # b s q s i k x x # c s f w # 2 # a s s z w q # b e f s a # c l m c f s a # ================================================== The first number in the first line (3) contains the number of classes. Every following line stands for one event. The first number gives the class of this event. The #-symbol seperates the different feature sets that fire if a specific class occurs. A feature can be any string. Please note that there have to be N+1 different '#' symbols if there are N different classes to have a valid file. (2) GIS-TRAINING: SHELL> cat EventFile | ME.out > MuFile This optimizes the parameters of the MaximumEntropy model and writes the results in the file 'MuFile' (3) TEST-CORPUS : SHELL> cat EventFile | ME.out MuFile This computes the probability for every observation of every event and the test corpus perplexity: ==================================================== 0.279895 0.420937 0.299168 1 0 0 0.248529 0.485829 0.265642 0.456161 0.543839 0 0.279895 0.420937 0.299168 0.289253 0.200896 0.509851 0.424436 0 0.575564 0.112457 0.795069 0.0924743 0 0.128044 0.871956 pp: 1.88557 ==================================================== (4) COUNT-BASED FEATURE SELECTION: You might want to reduce the number of different features by SHELL> cat EventFile | ME.out -red 3 > EventFile.red3 This removes any event from the EventFile that occurs 3 times or less. Then you can continue with steps (2) and (3). (5) SMOOTHING SHELL> cat EventFile | ME.out -dN 0.2 > MuFile.dN0.2 This performs smoothing of the observation counts. The smoothing method is absolute discounting. (6) LENGTH NORMALIZATION SHELL> cat EventFile | ME.out -lNorm > MuFile.lNorm This performs a length normalization by dividing each feature value by the number of features occurring for a certain event. Thereby the effective feature sum per event is 1.0. (7) ADJUSTING THE NUMBER OF ITERATIONS SHELL> cat EventFile | ME.out -iter 50 -deltaPP 0.001 Using these two parameters it is possible to adjust the number of iterations for the GIS. GIS is stopped in this example if more than fifty iterations are used OR the improvement in training corpus perplexity is smaller than 0.001. (8) SETTING OBSERVATION COUNTS You can change the observation counts of an observation (default: 1) by writing after the correct class number "$ count". For example: =========================================================== 0 $ 3 # a x s v # b s q s x x # c s f w # 0 $ 4 # a s s w q # b e f s 1 a # 2 c c f s a # ============================================================= Here, the first observation has count 3 and the second observation has count 4. (9) SIMPLE CLASSIFICATION PROBLEM EXAMPLE If you have a classification problem into N classes, then the first line of the feature file contains the number N. All the following lines then include - at the first position the correct class encoded as a number 0 ... N-1 of a certain event - then a space-separated "#" - then N different sequences of space-separated strings specifying the features. Every sequence is separated by a "#" The encoding of a feature can be an arbitrary string. Assume you want to classify documents into N different classes. Every document consists of words. As features you would like to use features that state that a certain word W is in a certain class C. Then, an ideal feature representation would be for example simply the concatenation of W and C (W_C). For example the feature that the word "money" is in the class "2" could be denoted as "money_2". Hence, if you have a document that contains only "money" and the document is in class 2, then the line would look like: 2 # money_0 # money_1 # money_2 # money_3 # Don't be surprised by the redundancy of this representation. It has the advantage that it is very general. If the document contains an additional word "fire", then the line would look alike: 2 # money_0 fire_0 # money_1 fire_1 # money_2 fire_2 # money_3 fire_3 # All the events are then described in lines of the event file. Hence, if you have 1000 documents, then your event file will contain 1001 lines (remember: first line contains number of classes). (10) SMOOTHING WITH GAUSSIAN PRIORS You can smooth the feature values using the method of Gaussian priors. You simply have to specify a parameter -smooth sigma where value is the standard deviation for the Gaussian prior. The theory is described in a paper by S. Chen and R. Rosenfeld: A Gaussian Prior for Smoothing Maximum Entropy Models (1999). (11) OTHER THINGS The toolkit has some other possibilities which can be used and might help to reduce error rate.