ImageCLEF 2005 - Automatic Annotation Task

ImageCLEF 2005 Automatic Annotation Task is part of the Cross Language Evaluation Forum (CLEF), a benchmarking event for multilingual information retrieval held annually since 2000. CLEF first began as a track in the Text Retrieval Conference (TREC, trec.nist.gov).

Retrieval tasks

In ImageCLEFmed 2005, there are two medical image retrieval tasks. Both tasks will likely require the use of image retrieval techniques for best results. The automatic image annotation task will not contain any text as input for the task and is aimed at image analysis research groups. An image retrieval system (GIFT) is available for the participants who do not have access to one themselves. This page is concerned with the Automatic Annotation Task

Automatic image annotation

Automatic image annotation or image classification can be an important step when searching for images from a database. Base on the IRMA project a database of 9,000 fully classified radiographs taken randomly from medical routine is made available and can be used to train a classification system. 1,000 radiographis for which classification labels are not available to the participants have to be classified. The aim is to find out how well current techniques can identify image modality, body orientation, body region, and biological system examined based on the images. The results of the classification step can be used for multilingual image annotations as well as for DICOM header corrections.

Although only 57 simple class numbers will be provided for ImageCLEFmed 2005. The images are annotated with complete IRMA code, a multi-axial code for image annotation. The code is currently available in English and German. It is planned to use the results of such automatic image annotation tasks for further, textual image retrieval tasks in the future.

Database & Download

Using the access code provided by CLEF, three files can be downloaded from the IRMA server

Unzipping the database results in a set of 57 directories (Train01, Train02, ... Train57). As pointed out in the CLEF copyright agreement, the images downloaded by the CLEF access are allowed to be used only for the CLEF competition. If you would like to use the imagery for any other proposes, you have to fill the IRMA transfer agreement.

Submission of Results

For submission of results, the following file format has to be used:

comment lines start with #, be sure to put contact information into a comment

all other lines are of the format

<imageno> <confidence for class 1> <confidence for class 2>...  <confidence for class 56> <confidence for class 57>

the class with the highest confidence is considered to be the class of the image

that is, an extract from a submission file might look like this:

   1876 1.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
   1895 0 0.1 0.9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
   1919 0 0.1 0.2 0.3 0.4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

These three lines would lead to a classification of the image with no 1876 into class 1, image 1895 is classified to be class 3, and image 1919 is classified to be class 5.

for each of the images to be classified there has to be one line
if you have several submissions you can submit several files

a python script to validate these files and print out classification results is available here
example for usage:

./readconfidencefile.py -c imageclef05-testname-necessary-autoannot confidences.txt

this prints out the classification result for each file classified and complains about files that are not expected to be classified. Most important is the last line which can give error rates (when correct classes are known) but which will also print whether there are files missing or files in the list which are unknown.

Example (what is not desireable): only 11 files could be classified, one of these is unknown to the system and 990 are missing:

ER: 0.0 classified: 11 wrong: 0 correct: 0 illegal: 1 missing: 990

Example (what is desireable): All files are known to the system and all 1000 files were classified:

ER: 0.0 classified: 1000 wrong: 0 correct: 0 illegal: 0 missing: 0

Explanation:

ER	is error rate if classification is known, in this case irrelevant
classified	how many lines could be used for classification (should be 1000)
wrong	how many of the valid classifications where wrong (irrelevant for the moment)
correct	how many of the valid classifications where correct (irrelevant for the moment)
illegal	how many lines contained classifications of files that are not expected (mainly: wrong number in first column)
missing	how many of the files we were expecting to be classified were missing in the file

As you can easily check, the order of lines does not matter for the classification result, so please feel free to use any order.
Please be sure to check your files with this program to be sure that there are no major problems with your file.

a list containing the numbers of the files to be classified is available here. It can be used in connection with the output checker to see whether the result files is complete and does not contain any unwanted lines
submissions have to be sent to deselaers@i6.informatik.rwth-aachen.de
submissions should consist of
- a set of output files which are processed correctly by the python script readconfidencefile.py
- a specification of the experiments reported i.e. the meaning of each output file
- a brief description of the method used in each experiment.
The ranking of submissions is done by error rate. That is, the system that classifies fewest images wrongly has the best result.

Questions & Comments

If you have any questions or comments on these information, feel free to contact us:

Thomas Deselaers for technical questions concerning the data transfer and evaluation.
Thomas Lehmann for general questions concerning the IRMA code and/or IRMA database.

Thomas Deselaers

Last modified: Fri May 20 18:13:10 CEST 2005

;