Information Retrieval for Text and Audio Documents
Accessing information in audio databases encompasses a wide range of applications in which spoken document retrieval (SDR) has an important place. A set of spoken documents constitutes the file for retrieval, to which the user addresses a request expressing an information need in natural language.
This original sequence of words is transformed by the system into a set of query terms which are used to retrieve documents that may or may not meet the user's information need. A good spoken document retrieval system retrieves as many relevant documents as possible by keeping the number of non-relevant retrieved documents to a minimum.
Audio documents containing speech are transcribed using automatic speech recognition. The transcriptions are preprocessed in the same manner as the queries are. Given a query, the retrieval metric rates each document in the database whether it meets the query or not. The result is a ranking list that includes all documents that are supposed to be relevant.
In this project a system for the automatic transcription and retrieval of English and German broadcast news was developed. Since automatically generated transcriptions are often error-prone, suitable retrieval metrics have been developed that proved to be very robust towards recognition errors.
To mitigate the effect of transcription errors in the training data, the training procedure was extended to automatically reject wrong transcriptions. The method is based on an algorithm that aligns a sequence of acoustic observations to a network of HMMs. Besides the word automaton for a given transcription the HMM network also contains Markov chains for pronunciation variants, garbage models, and phoneme transitions. All acoustic observations that are aligned to garbage models or phoneme transitions are regarded as wrong transcribed training data and thus rejected for the estimation of the acoustic models.