EvalTrans - Fast Evaluation for MT Research

Introduction

In machine translation, evaluation is necessary to compare system performance and rate progress. Unfortunately, automatically computable criteria like word error rate (WER) etc. depend fundamentally on the choice of sample translations.
Subjective, manually given criterions like subjective sentence error rate are very useful for this task, but require labourous evalation by human experts.
To faciliate evaluation as much as possible, and to give easy acess to collected evaluation data, we developed a tool: EvalTrans.
It has been successfully applied in various national and international projects. Recently, it has been used for the final comparative assessment of translation quality in the EuTrans project.

Features

Evaluation

Manually assigned quality criteria: score (0-10), information item errors (ok, absence, syntax, semantic, ...)
Automatically computable quality criterion: word error rate (multi-reference)
Whole-testcorpus quality criteria: subjective sentence error rate, information item error rate, average word error rate (multi-reference)
Quick extrapolation of score and information item errors from most similar database sentences possible: only "interesting" cases have to be evaluated manually

Database

Database contains complete test corpus including source sentences reference translation and all evaluated target hypotheses
Database stored in XML format for flexible re-use of stored data
Prepared for revision control systems like RCS

GUI

Graphical user interface gives access to all functions via pull-down menus and dialogs
Consistent manual evaluation is faciliated by presenting the most similar target sentence of each quality level during an evaluation session
A "visual diff" widget points up differences between sentences
Context dependend online help guides inexperienced users
Source and target sentences can be sorted, listed, deleted, re-evaluated, ...

Structure

EvalTrans is written in Tcl/Tk: Easy extension and on-the-fly debugging possible
Time-critical routines written in C: low answer time
Command line tool EvalTransBatchEval enables quick evaluation in batch jobs

Screenshots

Main menu and the most important submenus. The most important pieces of database statistics are shown in the main window.

List of target sentences assigned to a source sentence in database. These sentences are sorted by their score here; some sentences have been selected for further operations.

Dialog for the manual evaluation of a source/target hypothesis sentence pair. At the top you will find the sentence pair; below there is a list of the most similar target sentences in database. Similarity is indicated by the left-hand exclamation marks; the exact differences can be shown in the white box in the lower half. At the bottom there is the information item error indicator and some control buttons.

During an extrapolation session: Out of the 147 hypothesis sentences, 42 have not been found in database and have to be evaluated. As a quick check, scores for these sentences have been extrapolated; in addition, sentence 4 has been re-evaluated manually. Score, WER, information item error etc. are listed for each hypothesis sentence and for the whole test/hypothesis corpus.

Most dialogs have a "help" button which links directly to the corresponding section of the hypertext online help.

System/Software requirements

To use EvalTrans, you need a system supported by the following software:

Tcl/Tk 8.0 or higher.
TclExpat 1.1 or higher. Source code necessary, as it has to be patched: There are problems with the lack of Unicode support in Tcl 8.0; higher versions have not been tested yet. TclXML should work (slowly), too, but has not been tested yet.
BWidget ToolKit 1.2.1 or higher.
A compiler that supports building shared libraries for Tcl/Tk (e.g. GCC)

Obtaining EvalTrans / Contact

EvalTrans can be obtained from the Natural Language Software Registry (an initiative of the ACL). Click here to get to the EvalTrans page there.

Publications / Ressources

Publications

Sonja Nießen, Franz Josef Och, Gregor Leusch, Hermann Ney. "An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research". In Proc. 2nd International Conference on Language Resources and Evaluation, pp. 39-45, Athens, Greece, May-June 2000. Corrected version:
Stephan Vogel, Sonja Nießen, Hermann Ney. "Automatic Extrapolation of Human Assessment of Translation Quality". In 2nd International Conference on Language Resources and Evaluation: Proceedings of the Workshop on Evaluation of Machine Translation, pp. 35-39, Athens, Greece, May-June 2000.

Ressources

EvalTrans Online documentation: README, EvalTrans online help file

Last modified: Fri Jul 5 17:30:12 CEST 2002