Projects

Projects

Current Projects

RESCALE Resource-Efficient Speech and Language Processing
The general goal of the Project RESCALE is to reduce the computational resources and power consumption of performant human language technology systems significantly: automatic speech recognition, and machine translation of text and speech. Natural language processing technologies such as automatic speech recognition and machine translation have reached a high level of maturity thanks to advanced machine learning concepts. The added value for the end users as well as the efficiency gains for globally operating companies will lead to pervasive application of such human language technologies. Today, these technologies essentially are based on machine learning methods and, in particular, deep neural networks of considerable size. Their high performance enables the widespread use of automatic speech recognition and machine translation in the first place. However, this usually requires cloud-based computing, with a correspondingly high energy consumption. Completely neural architectures also require extremely resource-intensive training. Due to the expected worldwide usage of human language technology, it is therefore essential to promote ressource-efficient approaches. The ubiquitous use of ressource-efficient approaches in human language technology can lead to a reduction in energy requirements, as other, more energy-consuming procedures are eliminated.

HYKIST Usage of hybdrid AI language technologies for quality enhancement in health care
Within the HYKIST project, a real-time-based interpreting and translation system for the use of language mediators is being developed by using artificial intelligence technologies. The aim of the project is to achieve optimized medical care for non-German speaking patients by improving partially automatized communication. For this purpose, technologies of automatized speech recognition and translation are linked with a dialogue system for initial anamnesis and integrated into an existing platform for telecommunications. First of all, the project collects extensive dialogues of the Arabic language as well as of Vietnamese, which form the basis for the development of algorithms and applications. The first technical tests with regard to the accuracy and quality of the automatized translations are already carried out during the project. Afterwards, the overall system needs to be tested in a pilot test together with clinical application partners for the area of emergency admissions and initial anamnesis in acute situations and also needs to be evaluated in a final clinical study with regard to user acceptance.

Completed Projects

CEASELESS Chunk Learning and the Development of Speaking and Listening Fluency: Integrating Experimental and Computational Approaches
Effective communication skills are key to personal contentment, academic achievement and professional career success. Being able to communicate effectively is even found to facilitate social relationships. Competent language users experience more success in conveying their knowledge and views. However, producing and comprehending fluent and informationally dense speech are highly demanding communication skills that call upon many language-related and general cognitive abilities. The development of these skills in a non-native (second)language is even more challenging due to a lack of automatized knowledge. The difficulty of mastering speaking and listening skills is exacerbated by the fact that they both are subject to real-time constraints. Recent theoretical approaches to the understanding of human language processing posit that to ameliorate the effects of these constraints, humans learn to rapidly and efficiently recode and compress the linguistic input into larger units ('chunks') and rely on such chunks to facilitate language production and comprehension. The present research project (CEASELESS) is aimed at [1] advancing our understanding of the role of chunking mechanisms in non-native speech production and comprehension under real time constraints and [2] paving the way for the development of an automatic scoring system geared towards assessing speaking and listening competencies that provides individualized feedback based on reliable performance metrics that goes beyond coarse-grained categories typically provided by human ratings. The key to the success of the CEASELESS project is a trandisciplinary and multi- methodological approach, along with along with strong theory-driven investigation. This innovative research draws on cutting-edge methods from experimental psychology, psycholinguistics, natural language processing techniques, automatic speech recognition and machine learning.

SEQCLAS A Sequence Classification Framework for Human Language Technology
The ERC Advanced Grant project SEQCLAS started just at the end of the reporting period. This project will develop a unifying framework of novel methods for sequence classification and thus advance the areas of automatic speech recognition and machine translation beyond state-of-the-art. Despite the huge progress made in the field, the specific aspect of sequence classification has not been addressed adequately in the past research in these disciplines and remains a big challenge. SEQCLAS is targeted to provide a novel framework under consistent consideration of the leading aspect of sequence classification. The leading research objectives are: 1. A novel theoretical framework for sequence classification. 2. Consistent sequence modeling across training and testing, which is specifically lacking in machine translation. 3. Adequate sequence-level performance-aware training criteria to learn the free parameters of the models. 4. Investigation of (true) unsupervised training for HLT sequence classification: its principles, its prerequisites, its limitations and its practical usage. The study of these four problems will provide key enabling techniques for human language technology sequence classification in general that will carry over to and create high impact on the areas of speech recognition, machine translation and handwritten text recognition. Using our research prototype systems, we will verify the validity and effectiveness of our research on public international benchmarks.

DFG CoreTec Core Technolgies for statistical machine translation
By 2016, a lot of progress had been made in statistical machine translation (SMT). Nevertheless, the existing statistical methods did not yet capture all the relevant interdependencies of the words in source and target language. Our project CoreTech focused on three problems in order to improve the state of the art in statistical machine translation (SMT) of written and spoken language: 1) Artificial neural networks (NN): To extend the existing NN approaches for better modeling the dependencies between source sentence and target sentence; special attention to be given to recurrent neural networks and the word re-ordering problem. Word re-ordering is a serious problem for German because its word order tends to be very much different from the word order of other European languages. 2) Extended translation models and improved/consistent training: The then existing phrase-based approaches lacked a sound statistical basis; in particular there was no consistent training procedure of the phrases in the phrase-based approach. 3) Interface for spoken language input: In addition to the machine translation of text, CoreTech also addressed speech translation. The output of the ASR (automatic speech recognition) engine is the input to the SMT engine, and the goal was to improve the interface between ASR and SMT by various methods like punctuation prediction, enriched word hypothesis lattices, and joint optimization of the ASR-SMT pipeline.

LISTEN Hands-free voice-enabled interface to web applications for smart home environment
Within the H2020 Marie Skłodowska-Curie action LISTEN, we develop a large-vocabulary automatic speech recognition system optimised for accessing web applications and controlling web-enabled smart home automation functionalities. LISTEN pushes the boundaries of current state-of-the-art by bridging the gap between the acoustic front-end and automatic speech recognition research communities, with the common goal of developing a smart-home-specific natural voice interface to web services.

QT21 Quality Translation 21
QT21 is a machine translation project which receives funding from the European Union's Horizon 2020 research and innovation programme. QT21 addresses in particular morphologically complex languages, with diverse word order and often with little training resources, by substantially improved statistical and machine-learning based translation models, guided by a systematic analysis of quality barriers, informed by human translators, and all with a strong focus on scalability.

Babel
The US-IARPA funded Babel Program is developing the ability for a quick development of a speech recognition system, given any new language. The new methods are developed in particular for low-resourced languages and challenging acoustic conditions.

EU-BRIDGE
The EU-BRIDGE project, funded by the European Union, aims at developing automatic transcription and translation technology that will permit the development of innovative multimedia captioning and translation services of audiovisual documents between European and non-European languages. The project will provide streaming technology that can convert speech from lectures, meetings, and telephone conversations into the text in another language.

Quaero Automatic multimedia content processing
Quaero is a collaborative research and development program, centered at developing multimedia and multilingual indexing and management tools for professional and general public applications such as the automatic analysis, classification, extraction and exploitation of information. The research aims to facilitate the extraction of information in unlimited quantities of multimedia and multilingual documents, including written texts, speech and music audio files, and images and videos. Quaero was created to respond to new needs for the general public and professional use, and new challenges in multimedia content analysis resulting from the explosion of various information types and sources in digital form, available to everyone via personal computers, television and handheld terminals.

BOLT Broad Operational Language Translation
The DARPA BOLT program seeks to accurately translate Mandarin Chinese and multiple dialects of Arabic into English from all types of media, specifically focused on the challenging task of informal conversational speech, email text and instant messaging.

SIGNSPEAK Scientific understanding and vision-based technological development for continuous sign language recognition and translation
Deaf communities revolve around sign languages as they are their natural means of communication. Although deaf, hard of hearing and hearing signers can communicate without boundaries amongst themselves, there is a serious challenge for the deaf community in trying to integrate into educational, social and work environments, as the vast majority of Europeans do not have signing skills. The overall goal of SIGNSPEAK is to develop a new vision-based technology for translating continuous sign language to text, in order to improve the communication between deaf and hearing communities.

GALE Global Autonomous Language Exploitation
The goal of the DARPA GALE program is to produce a system that is able to automatically take multilingual newscasts, text documents, and other forms of communication, and make their information available to human queries. GALE has three major technical challenges: Automatic speech recognition, to process audio data, machine translation, to translate non-English data, and distillation, to extract the most useful pieces of information related to a given query.

LUNA Spoken language understanding in multilingual communication systems
The LUNA project is focused on the problem of real-time understanding of spontaneous speech in the context of advanced telecom services. The main objective of LUNA is the creation of a robust natural spoken language understanding toolkit for multilingual dialogue services, able to carry out human-computer communication with a good degree of user satisfaction. The vision of LUNA is to improve current automated telephone systems allowing easy human-machine interactions through spontaneous and unconstrained speech, replacing menu-driven voice recognition. The project aims to enhance the users' experience, helping callers in using vocal services quickly and accurately.

TRAMES Traduction Automatique par Méthodes Statistiques
The aim of TRAMES is to develop an automatic translation system capable of processing documents from various domains, such as written text or radio/television and conversational speech transcripts, and producing corresponding French translations. A data-driven machine-translation system developed at RWTH/i6 is interfaced with a graphical frontend (developed by Bertin Technologies) that displays the MT system output, such as N-best translation hypotheses, word alignments, confidence measures etc. A large-scale translation engine is designed to cover Arabic-French as primary language pair.

IRMA Image Retrieval in Medical Applications
The RWTH IRMA project is a joint project of the Institute of Medical Informatics, the Department of Diagnostic Radiology, and Lehrstuhl für Informatik 6. The goal of this project is the realization of a content-based image retrieval system suited for use in daily medical routine.

DFG Project Statistical Methods for Written Language Translation
This project aims at the development and improvement of statistical machine translation. The following problems are tackled: large vocabulary translation, improvement of statistical alignment and lexicon models, integration of mono- and bilingual grammars and morphological analysis, and adaption and improvement of training and search algorithms for statistical machine translation. The project is supported by Deutsche Forschungsgemeinschaft (DFG).

DFG Project Statistical Modeling for Image Object Recognition
The aim of the project is to investigate suitable statistical models for image object recognition on three levels: (1) modeling of object appearance using maximum entropy models; (2) modeling of the variability of image objects using hidden Markov models; (3) modeling of complex scenes using holistic approaches. The project is supported by Deutsche Forschungsgemeinschaft (DFG).

TC-STAR Technology and Corpora for Speech to Speech Translation
TC-STAR was a concentrated six year effort for advanced research in all core technologies for speech to speech translation: speech recognition, translation, and synthesis. The project targeted a selection of unconstrained conversational speech domains i.e. broadcast news and speeches and a few languages relevant for Europe's economy and society: Chinese, European English and European Spanish. The technical challenges and objectives of the project focused on the development of new algorithms and methods, integrating relevant human knowledge which is available at translation time into a data-driven framework. Examples of such new approaches are the integration of linguistic knowledge in the statistical approach of spoken language translation, the statistical modelling of pronunciation of unconstrained conversational speech in automatic speech recognition, and new acoustic and prosodic models for generating expressive speech in speech synthesis.
TC-STAR was supported by the European Union.

DFG Project Structured Acoustic Models for Speech Recognition
Within this project a better structuring of the acoustic models for automatic speech recognition systems was investigated. Speech signals are affected by many variable factors like background noises, distortions in the transmission channel and speaker characteristics. The goal of the project was to improve the recognition by investigating and optimizing methods that allow a better adaptation to - or suppression of these undesired variabilities. These methods include: vocal tract length normalization which reduces the speaker dependent variability of the spectrum by applying a spectral warping function, histogram based transformations applied during feature extraction to increase the noise robustness; and adaptation of the acoustic model to different speakers and transmission channels based on maximum likelihood linear regression. The project was supported by Deutsche Forschungsgemeinschaft (DFG).

TransType2 Computer-Assisted Translation
The aim of TransType2 is to develop a Computer-Assisted Translation (CAT) system, which will help to meet the growing demand for high-quality translation. The innovative solution proposed by TransType2 is to embed a data-driven machine translation engine with an interactive translation environment. In this way, the system combines the best of two paradigms: the CAT paradigm, in which the human translator ensures high-quality output; and the machine translation paradigm, in which the machine ensures significant productivity gains. Another innovative feature of TransType2 is that it will have two input modalities: text and speech. Six different versions of the system will be developed for English, French, Spanish and German. To ensure that TransType2 corresponds to the translators' needs, two professional translation agencies will evaluate successive prototypes. TransType2 is supported by the European Union.

LC-STAR Lexica and Corpora for Speech-to-Speech Translation Technologies
The objective of the LC-STAR is to improve human-to-human and man-machine communication in multilingual environments. The project aims to create lexica and corpora needed for speech-to-speech translation. Within LC-STAR, quasi industrial standards for those language resources will be established, lexica for 12 languages and text corpora for 3 languages will be created. A speech to speech translation demonstrator for the three languages English, Spanish and Catalan will be developed. The Lehrstuhl für Informatik 6 will focus on the investigation of speech centered translation technologies focusing on requirements concerning language resources and the creation of lexica for speech recognition in German. LC-STAR is supported by the European Union.

PF-STAR Preparing future multisensorial interaction research
The PF-STAR project intends to contribute to establish future activities in the field of multisensorial and multilingual communication (interface technologies) on firmer bases by providing technological baselines, comparative evaluations, and assessment of prospects of core technologies, which future research and development efforts can build from. To this end, the project will address three crucial areas: technologies for speech-to-speech translation, the detection and expressions of emotional states, and core speech technologies for children. For each of them, promising technologies/approaches will be selected, further developed and aligned towards common baselines. The results will be assessed and evaluated with respect to both their performances and future prospects. To maximise the impact, the duration of the project is limited to 24 months, and the workplan has been designed to delivered results in two stages: at mid-project term (month 14), and at the end of the project. This will permit to make relevant results available as soon as possible, and in particular on time for them to be used during the preparatory phase of the first call of FP6. The Lehrstuhl für Informatik 6 is involved in the comparative evaluation and further development of speech translation technologies. The statistical approach is to be compared to an interlingua based approach. After the evaluation phase, the two approaches are to be further developed and aligned towards common baselines. PF-STAR is supported by the European Union.

TC-STAR_P Preparatory action for the project "Technology and Corpora for Speech to Speech Translation"
The objective of the TC-STAR_P project was to prepare a future integrated project named "Technology and Corpora for Speech to Speech Translation" (TC-STAR), which will be proposed under the 6th Framework Programme and will aim at making speech to speech translation real. TC-STAR_P was driven by industrial requirements and involved industry key actors active in the development of speech to speech translation systems and components, academic research institutions active in speech to speech translation systems and components research, infrastructure centres active in the development of language resources for speech to speech translation components and small medium enterprisess using the provided technologies. Roadmaps for the development of speech to speech translation were prepared, and further key actors from the industrial, research and infrastructure groups, as well as small medium enterprises working with speech to speech translation applications, have been involved. A new organisational model was developed. TC-STAR_P was supported by the European Union.

CORETEX
Nowadays commercial speech recognition systems work well for a very specific task and language. However, they are not able to adapt to new domains, acoustic environments and languages. The objectives of the CORETEX project were to develop generic speech recognition technology that works well for a wide range of tasks with essentially no exposure to task specific data and to develop methods for rapid porting to new domains and languages with limited, inaccurately or untranscribed training data. Another objective was to investigate techniques to produce an enriched symbolic speech transcription with extra information for higher level (symbolic) processing and to explore methods to use contemporary and/or topic-related texts to improve language models, and for automatic pronunciation generation for vocabulary extension.
We began with first investigations in unsupervised training, i.e. train a speech recognition system for a new task without dedicated transcribed training data for this specific task. One problem with genericity and portability was the recognition vocabulary. When shifting to a new task, a lot of work has to be done to manually build phonetic transcriptions for new words. We developed a method for automatically determine the phonetic transcription. Further we built a system to segment recorded broadcast shows into parts which can be handled by the speech recognition system. CORETEX was supported by the European Union.

VERBMOBIL II
VERBMOBIL II is a speaker-independent and bidirectional speech-to-speech translation system for spontaneous dialogues in mobile situations. It recognizes spoken input, analyses and translates it, and finally utters the translation. The multi-lingual system handles dialogues in three business-oriented domains with context-sensitive translation between three languages (German, English, and Japanese).
Within this project, the Lehrstuhl für Informatik 6 performed research on both speech recognition and translation. For both tasks, statistical methods were used and self-contained software modules were developed and integrated into the final prototype system. For the speech recognition part we developed efficient search algorithms which perform a real time operation. In the end-to-end evaluation, the statistical machine translation significantly outperformed the competing translation approaches such as classical transfer-based translation or example-based translation. Verbmobil II was funded by the Bildungsministerium für Bildung and Forschung (BMBF).

EuTrans
Machine translation has been receiving considerable attention for a long time, because of its great industrial and social interest. The focus of the EuTrans project was the development and evaluation of example-based translation techniques for text and speech input. Our institute contributed acoustic models for the recognition of Italian telephone speech and analyzed different statistical translation techniques. EuTrans was supported by the European Union.

ADVISOR
ADVISOR is a digital tool box for content analysis and rapid retrieval of videos. It aims at pushing the state-of-the-art in video annotation and retrieval technologies to the extent that formal information from videos are extracted (semi) automatically. A digital tool box for content analysis and rapid retrieval of videos brings together audio and video indexing, manual annotation and a search engine. In this project, our institute provided an automatic segmentation algorithm for MPEG audio streams and a large vocabulary continuous real-time speech recognition system for transcribing the speech segments. The recognizer output is enriched with confidence measures for information retrieval purposes. ADVISOR was supported by the European Union, with additional funding from the companies and organizations undertaking the work.

GIZA++
The aim of this project was the development of a publicly available baseline toolkit for statistical machine translation. We extended an existing simple toolkit (GIZA) by implementing training algorithms for statistical translation models. This project was supported by the National Science Foundation (NSF).