- SEQCLAS A Sequence Classification Framework for Human Language Technology
The ERC Advanced Grant project SEQCLAS started just at the end of the
reporting period. This project will develop a unifying framework of novel
methods for sequence classification and thus advance the areas of automatic
speech recognition and machine translation beyond state-of-the-art. Despite
the huge progress made in the field, the specific aspect of sequence
classification has not been addressed adequately in the past research in
these disciplines and remains a big challenge. SEQCLAS is targeted to
provide a novel framework under consistent consideration of the leading
aspect of sequence classification. The leading research objectives are: 1. A
novel theoretical framework for sequence classification. 2. Consistent
sequence modeling across training and testing, which is specifically lacking
in machine translation. 3. Adequate sequence-level performance-aware
training criteria to learn the free parameters of the models. 4.
Investigation of (true) unsupervised training for HLT sequence
classification: its principles, its prerequisites, its limitations and its
practical usage. The study of these four problems will provide key enabling
techniques for human language technology sequence classification in general
that will carry over to and create high impact on the areas of speech
recognition, machine translation and handwritten text recognition. Using our
research prototype systems, we will verify the validity and effectiveness of
our research on public international benchmarks.
- LISTEN Hands-free voice-enabled interface to web applications for smart home environment
Within the H2020 Marie Skłodowska-Curie action LISTEN, we develop a large-vocabulary automatic speech recognition system optimised for accessing web applications and controlling web-enabled smart home automation functionalities. LISTEN pushes the boundaries of current state-of-the-art by bridging the gap between the acoustic front-end and automatic speech recognition research communities, with the common goal of developing a smart-home-specific natural voice interface to web services.
- QT21 Quality Translation 21
QT21 is a machine translation project which receives funding from the European Union's Horizon 2020 research and innovation programme. QT21 addresses in particular morphologically complex languages, with diverse word order and often with little training resources, by substantially improved statistical and machine-learning based translation models, guided by a systematic analysis of quality barriers, informed by human translators, and all with a strong focus on scalability.
The US-IARPA funded Babel Program is developing the ability for a quick development of a speech recognition system, given any new language. The new methods are developed in particular for low-resourced languages and challenging acoustic conditions.
- Graduiertenkolleg Software for Communication Systems
In many practical applications speech recognition systems have to work
in adverse environmental conditions. Frequency distortions and noises
caused by the transmission are typical for telephone applications.
Considerable amounts of varying background noise are a problem for all
mobile applications such as cellular phones or speech controlled systems
in cars. The recognition error rates of speech recognition systems using
standard methods usually rise considerably in these conditions.
The goal of the project is to improve robustness of speech recognition
systems by using articulatory based acoustical features. The
state-of-the-art automatic speech recognition systems use spectral
representation of the speech signal such as Mel Frequency Cepstral
Coefficients or Linear Predictive Coefficients. These
representation techniques are not robust to acoustical variation like
background noise, speaker-dependent acoustic variation, etc. There have
been some promising attempts at using articulatory information to
improve the robustness of speech recognition systems. Articulatory
features describe speech sounds in terms of the state and the movement
of human organs essential for speech production. Articulatory features
are for example the state of vocal cords, the shape of constrictions in
vocal tract, the rounding of lips, the position of tongue, etc.
Motivation of using articulatory based features for speech recognition
is that the overall patterns in the speech signal caused by articulatory
gestures are more robust to noise and speaker-dependent acoustic
variation than spectral parameters. Additionally articulatory and
spectral representations of speech can supply mutually complementary
information to a speech recognizer, in which case a combination of these
representations might be beneficial.
The project is supported by
Deutsche Forschungsgemeinschaft (DFG).
The EU-BRIDGE project, funded by the European Union, aims at
developing automatic transcription and translation technology that will
permit the development of innovative multimedia captioning and translation
services of audiovisual documents between European and non-European
languages. The project will provide streaming technology that can convert
speech from lectures, meetings, and telephone conversations into the text
in another language.
- Quaero Automatic multimedia content processing
Quaero is a collaborative research and development program, centered at developing multimedia and
multilingual indexing and management tools for professional and general public applications
such as the automatic analysis, classification, extraction and exploitation of information.
The research aims to facilitate the extraction of information in unlimited quantities of multimedia
and multilingual documents, including written texts, speech and music audio files, and images and videos.
Quaero was created to respond to new needs for the general public and professional use, and new challenges
in multimedia content analysis resulting from the explosion of various information types and sources in digital form,
available to everyone via personal computers, television and handheld terminals.
BOLT Broad Operational Language Translation
The DARPA BOLT program seeks to accurately translate Mandarin Chinese and
multiple dialects of Arabic into English from all types of media,
specifically focused on the challenging task of informal conversational
speech, email text and instant messaging.
- SIGNSPEAK Scientific understanding and vision-based technological development for continuous sign language recognition and translation
Deaf communities revolve around sign languages as they are their natural means of communication.
Although deaf, hard of hearing and hearing signers can communicate without boundaries amongst themselves,
there is a serious challenge for the deaf community in trying to integrate into educational,
social and work environments, as the vast majority of Europeans do not have signing skills.
The overall goal of SIGNSPEAK is to develop a new vision-based technology for translating continuous sign language to text,
in order to improve the communication between deaf and hearing communities.
GALE Global Autonomous Language Exploitation
The goal of the DARPA GALE program is to produce a system that is able to automatically take multilingual newscasts,
text documents, and other forms of communication, and make their information available to human queries.
GALE has three major technical challenges: Automatic speech recognition, to process audio data,
machine translation, to translate non-English data, and distillation, to extract the most useful pieces
of information related to a given query.
- LUNA Spoken language understanding in multilingual communication systems
The LUNA project is focused on the problem of real-time understanding of spontaneous speech in the context of advanced telecom services.
The main objective of LUNA is the creation of a robust natural spoken language understanding toolkit for multilingual dialogue services,
able to carry out human-computer communication with a good degree of user satisfaction.
The vision of LUNA is to improve current automated telephone systems allowing easy human-machine interactions through spontaneous and
unconstrained speech, replacing menu-driven voice recognition. The project aims to enhance the users' experience,
helping callers in using vocal services quickly and accurately.
- TRAMES Traduction Automatique par Méthodes Statistiques
The aim of TRAMES is to develop an automatic translation system capable of
processing documents from various domains, such as written text or radio/television
and conversational speech transcripts, and producing corresponding French translations.
A data-driven machine-translation system developed at RWTH/i6 is interfaced with a graphical frontend
(developed by Bertin Technologies) that displays the MT system output, such as N-best translation hypotheses,
word alignments, confidence measures etc. A large-scale translation engine is designed to cover
Arabic-French as primary language pair.
- IRMA Image Retrieval in Medical Applications
The RWTH IRMA project is a joint project of the Institute of Medical
Informatics, the Department of Diagnostic Radiology,
and Lehrstuhl für Informatik 6. The goal of this project is the
realization of a content-based image retrieval system suited for use
in daily medical routine.
- DFG Project Statistical Methods for Written Language Translation
This project aims at the development and improvement of statistical
machine translation. The following problems are tackled: large
vocabulary translation, improvement of statistical alignment and
lexicon models, integration of mono- and bilingual grammars and
morphological analysis, and adaption and improvement of training and
search algorithms for statistical machine translation. The
project is supported by Deutsche Forschungsgemeinschaft (DFG).
- DFG Project Statistical Modeling for Image Object Recognition
The aim of the project is to investigate suitable statistical models for
image object recognition on three levels: (1) modeling of object
appearance using maximum entropy models; (2) modeling of the
variability of image objects using hidden Markov models; (3) modeling
of complex scenes using holistic approaches.
The project is supported by Deutsche Forschungsgemeinschaft (DFG).
- TC-STAR Technology and Corpora for Speech to Speech Translation
TC-STAR was a concentrated six year effort for advanced research in all
core technologies for speech to speech translation: speech
recognition, translation, and synthesis. The project targeted a
selection of unconstrained conversational speech domains
i.e. broadcast news and speeches and a few languages relevant
for Europe's economy and society: Chinese, European English and
European Spanish. The technical challenges and objectives
of the project focused on the development of new algorithms
and methods, integrating relevant human knowledge which is
available at translation time into a data-driven
framework. Examples of such new approaches are the integration of
linguistic knowledge in the statistical approach of spoken language
translation, the statistical modelling of pronunciation of
unconstrained conversational speech in automatic speech recognition,
and new acoustic and prosodic models for generating expressive speech
in speech synthesis.
TC-STAR was supported by the European Union.
- DFG Project Structured Acoustic Models for Speech Recognition
Within this project a better structuring of the acoustic models for
automatic speech recognition systems was investigated. Speech
signals are affected by many variable factors like background
noises, distortions in the transmission channel and speaker
characteristics. The goal of the project was to improve the
recognition by investigating and optimizing
methods that allow a better adaptation to - or suppression of these
undesired variabilities. These methods include: vocal tract length
normalization which reduces the speaker dependent variability of the
spectrum by applying a spectral warping function, histogram based
transformations applied during feature extraction to increase the noise
robustness; and adaptation of the acoustic model to different speakers and
transmission channels based on maximum likelihood linear
regression. The project was supported by Deutsche
- TransType2 Computer-Assisted Translation
The aim of TransType2 is to develop a Computer-Assisted Translation (CAT)
system, which will help to meet the growing demand for high-quality
translation. The innovative solution proposed by TransType2 is to embed a
data-driven machine translation engine with an interactive translation
environment. In this way, the system combines the best of two
paradigms: the CAT paradigm, in which the human translator ensures
high-quality output; and the machine translation paradigm, in which the machine ensures
significant productivity gains. Another innovative feature of TransType2 is
that it will have two input modalities: text and speech. Six
different versions of the system will be developed for English,
French, Spanish and German. To ensure that TransType2 corresponds to the
translators' needs, two professional translation agencies will
evaluate successive prototypes. TransType2 is supported by the
- LC-STAR Lexica and Corpora for Speech-to-Speech Translation Technologies
The objective of the LC-STAR is to improve human-to-human and man-machine communication in multilingual environments. The project aims to create lexica
and corpora needed for speech-to-speech translation. Within LC-STAR, quasi industrial standards for those language resources will be established,
lexica for 12 languages and text corpora for 3 languages will be created. A speech to speech translation demonstrator for the three languages English,
Spanish and Catalan will be developed. The Lehrstuhl für Informatik 6 will focus on the investigation of speech centered translation technologies
focusing on requirements concerning language resources and the creation of lexica for speech recognition in German.
LC-STAR is supported by the European Union.
- PF-STAR Preparing future multisensorial interaction research
The PF-STAR project intends to contribute to establish future
activities in the field of multisensorial and multilingual
communication (interface technologies) on firmer bases by providing
technological baselines, comparative evaluations, and assessment of
prospects of core technologies, which future research and development
efforts can build from.
To this end, the project will address three crucial areas:
technologies for speech-to-speech translation, the detection and
expressions of emotional states, and core speech technologies for
For each of them, promising technologies/approaches will be selected,
further developed and aligned towards common baselines. The results
will be assessed and evaluated with respect
to both their performances and future prospects.
To maximise the impact, the duration of the project is limited to 24
months, and the workplan has been designed to delivered results in two
stages: at mid-project term (month 14), and at the end of the
project. This will permit to make relevant results available as soon
as possible, and in particular on time for them to be used during the
preparatory phase of the first call of FP6.
The Lehrstuhl für Informatik 6 is involved in the comparative
evaluation and further
development of speech translation technologies. The statistical
approach is to be compared to an interlingua based approach. After the
evaluation phase, the two approaches are to be further developed and
aligned towards common baselines. PF-STAR is supported by the
- TC-STAR_P Preparatory action for the project "Technology and Corpora for Speech to Speech Translation"
The objective of the TC-STAR_P project was to prepare a future integrated
project named "Technology and Corpora for Speech to Speech
Translation" (TC-STAR), which will be proposed under the 6th Framework
Programme and will aim at making speech to speech translation
real. TC-STAR_P was driven by industrial requirements and
involved industry key actors active in the development of speech
to speech translation systems
and components, academic research institutions active in speech to speech translation systems
and components research, infrastructure centres active in the
development of language resources for speech to speech
translation components and small medium enterprisess using
the provided technologies. Roadmaps for the development of speech to speech translation were
prepared, and further key actors from the industrial, research and
infrastructure groups, as well as small medium enterprises
working with speech to speech translation applications,
have been involved. A new organisational model was
developed. TC-STAR_P was supported by the
Nowadays commercial speech recognition systems
work well for a very specific
task and language. However, they are not able to adapt to new domains,
acoustic environments and languages. The objectives of the CORETEX
project were to develop generic speech recognition technology
that works well for a wide range of tasks with essentially no exposure to
task specific data and to develop methods for rapid porting to new domains
and languages with limited, inaccurately or untranscribed training data.
Another objective was to investigate techniques to produce an enriched
symbolic speech transcription with extra information for higher level
(symbolic) processing and to explore methods to use contemporary and/or
topic-related texts to improve language models, and for automatic
pronunciation generation for vocabulary extension.
We began with first investigations in unsupervised training,
i.e. train a speech recognition system for a new task without
dedicated transcribed training data for this specific task. One
problem with genericity and portability was the recognition
vocabulary. When shifting to a new task, a lot of work has to be
done to manually build phonetic transcriptions for new words. We
developed a method for automatically determine the phonetic
transcription. Further we built a
system to segment recorded broadcast shows into parts which can be
handled by the speech recognition system. CORETEX was supported by the
- VERBMOBIL II
VERBMOBIL II is a speaker-independent and bidirectional
speech-to-speech translation system for spontaneous dialogues in mobile
situations. It recognizes spoken input, analyses and translates it,
and finally utters the translation. The multi-lingual system handles
dialogues in three business-oriented domains with context-sensitive
translation between three languages (German, English, and
Within this project, the Lehrstuhl für Informatik 6
performed research on both speech recognition and
translation. For both tasks, statistical methods were used and
self-contained software modules were developed and integrated into the
final prototype system. For the speech recognition part we developed
efficient search algorithms which perform a real time operation. In
the end-to-end evaluation, the statistical machine translation
significantly outperformed the competing translation approaches such
as classical transfer-based translation or example-based translation.
Verbmobil II was funded by the Bildungsministerium für
Bildung and Forschung (BMBF).
Machine translation has been receiving considerable attention for a long time,
because of its great industrial and social interest. The focus of the
EuTrans project was the development and evaluation of example-based translation
techniques for text and speech input. Our institute contributed acoustic models
for the recognition of Italian telephone speech and analyzed different
statistical translation techniques.
EuTrans was supported by the European Union.
ADVISOR is a digital tool box for content analysis and rapid retrieval of
videos. It aims at pushing the
state-of-the-art in video annotation and retrieval technologies to the
extent that formal information from videos are extracted (semi)
automatically. A digital tool box for content analysis and rapid
retrieval of videos brings together audio and video indexing, manual
annotation and a search engine.
In this project, our institute provided an automatic segmentation algorithm
for MPEG audio streams and a large vocabulary continuous real-time speech
recognition system for transcribing the speech segments. The recognizer output
is enriched with confidence measures for information retrieval purposes.
ADVISOR was supported by the European Union, with
additional funding from the companies and organizations undertaking the work.
The aim of this project was the development of a
publicly available baseline toolkit for statistical
machine translation. We extended an existing simple
toolkit (GIZA) by implementing training algorithms
for statistical translation models.
This project was supported by the
National Science Foundation (NSF).