- RESCALE Resource-Efficient
Speech and Language Processing
The general goal of the
Project RESCALE is to reduce the computational resources and
power consumption of performant human language technology
systems significantly: automatic speech recognition, and
machine translation of text and speech. Natural language
processing technologies such as automatic speech recognition
and machine translation have reached a high level of maturity
thanks to advanced machine learning concepts. The added value
for the end users as well as the efficiency gains for globally
operating companies will lead to pervasive application of such
human language technologies. Today, these technologies
essentially are based on machine learning methods and, in
particular, deep neural networks of considerable size. Their
high performance enables the widespread use of automatic speech
recognition and machine translation in the first
place. However, this usually requires cloud-based computing,
with a correspondingly high energy consumption. Completely
neural architectures also require extremely resource-intensive
training. Due to the expected worldwide usage of human language
technology, it is therefore essential to promote
ressource-efficient approaches. The ubiquitous use of
ressource-efficient approaches in human language technology can
lead to a reduction in energy requirements, as other, more
energy-consuming procedures are eliminated.
- HYKIST Usage of hybdrid AI language technologies for quality enhancement in health care
Within the HYKIST project, a real-time-based interpreting and translation system for the use of language mediators is being developed by using artificial intelligence technologies.
The aim of the project is to achieve optimized medical care for non-German speaking patients by improving partially automatized communication. For this purpose, technologies of automatized speech recognition and translation are linked with a dialogue system for initial anamnesis and integrated into an existing platform for telecommunications. First of all, the project collects extensive dialogues of the Arabic language as well as of Vietnamese, which form the basis for the development of algorithms and applications. The first technical tests with regard to the accuracy and quality of the automatized translations are already carried out during the project. Afterwards, the overall system needs to be tested in a pilot test together with clinical application partners for the area of emergency admissions and initial anamnesis in acute situations and also needs to be evaluated in a final clinical study with regard to user acceptance.
- CEASELESS Chunk Learning and the Development of Speaking and Listening Fluency: Integrating Experimental and Computational Approaches
Effective communication skills are key to personal contentment, academic achievement and
professional career success. Being able to communicate effectively is even found to facilitate social
relationships. Competent language users experience more success in conveying their knowledge and
views. However, producing and comprehending fluent and informationally dense speech are highly
demanding communication skills that call upon many language-related and general cognitive
abilities. The development of these skills in a non-native (second)language is even more challenging
due to a lack of automatized knowledge. The difficulty of mastering speaking and listening skills is
exacerbated by the fact that they both are subject to real-time constraints. Recent theoretical
approaches to the understanding of human language processing posit that to ameliorate the effects
of these constraints, humans learn to rapidly and efficiently recode and compress the linguistic input
into larger units ('chunks') and rely on such chunks to facilitate language production and
comprehension.
The present research project (CEASELESS) is aimed at [1] advancing our understanding of the role of
chunking mechanisms in non-native speech production and comprehension under real time
constraints and [2] paving the way for the development of an automatic scoring system geared
towards assessing speaking and listening competencies that provides individualized feedback based
on reliable performance metrics that goes beyond coarse-grained categories typically provided by
human ratings. The key to the success of the CEASELESS project is a trandisciplinary and multi-
methodological approach, along with along with strong theory-driven investigation. This innovative
research draws on cutting-edge methods from experimental psychology, psycholinguistics, natural
language processing techniques, automatic speech recognition and machine learning.
- SEQCLAS A Sequence Classification Framework for Human Language Technology
The ERC Advanced Grant project SEQCLAS started just at the end of the
reporting period. This project will develop a unifying framework of novel
methods for sequence classification and thus advance the areas of automatic
speech recognition and machine translation beyond state-of-the-art. Despite
the huge progress made in the field, the specific aspect of sequence
classification has not been addressed adequately in the past research in
these disciplines and remains a big challenge. SEQCLAS is targeted to
provide a novel framework under consistent consideration of the leading
aspect of sequence classification. The leading research objectives are: 1. A
novel theoretical framework for sequence classification. 2. Consistent
sequence modeling across training and testing, which is specifically lacking
in machine translation. 3. Adequate sequence-level performance-aware
training criteria to learn the free parameters of the models. 4.
Investigation of (true) unsupervised training for HLT sequence
classification: its principles, its prerequisites, its limitations and its
practical usage. The study of these four problems will provide key enabling
techniques for human language technology sequence classification in general
that will carry over to and create high impact on the areas of speech
recognition, machine translation and handwritten text recognition. Using our
research prototype systems, we will verify the validity and effectiveness of
our research on public international benchmarks.
- DFG CoreTec Core Technolgies for statistical machine translation
By 2016, a lot of progress had been made in statistical machine translation (SMT). Nevertheless, the existing statistical
methods did not yet capture all the relevant interdependencies of the words in source and target language.
Our project CoreTech focused on three problems in order to improve the state of the art in statistical
machine translation (SMT) of written and spoken language:
1) Artificial neural networks (NN): To extend the existing NN approaches for better modeling the
dependencies between source sentence and target sentence; special attention to be given to recurrent neural
networks and the word re-ordering problem. Word re-ordering is a serious problem for German because its
word order tends to be very much different from the word order of other European languages.
2) Extended translation models and improved/consistent training: The then existing phrase-based approaches
lacked a sound statistical basis; in particular there was no consistent training procedure of the phrases in the phrase-based approach.
3) Interface for spoken language input: In addition to the machine translation of text, CoreTech also
addressed speech translation. The output of the ASR (automatic speech recognition) engine is the input to the
SMT engine, and the goal was to improve the interface between ASR and SMT by various methods like punctuation
prediction, enriched word hypothesis lattices, and joint optimization of the ASR-SMT pipeline.
- LISTEN Hands-free voice-enabled interface to web applications for smart home environment
Within the H2020 Marie Skłodowska-Curie action LISTEN, we develop a large-vocabulary automatic speech recognition system optimised for accessing web applications and controlling web-enabled smart home automation functionalities. LISTEN pushes the boundaries of current state-of-the-art by bridging the gap between the acoustic front-end and automatic speech recognition research communities, with the common goal of developing a smart-home-specific natural voice interface to web services.
- QT21 Quality Translation 21
QT21 is a machine translation project which receives funding from the European Union's Horizon 2020 research and innovation programme. QT21 addresses in particular morphologically complex languages, with diverse word order and often with little training resources, by substantially improved statistical and machine-learning based translation models, guided by a systematic analysis of quality barriers, informed by human translators, and all with a strong focus on scalability.
- Babel
The US-IARPA funded Babel Program is developing the ability for a quick development of a speech recognition system, given any new language. The new methods are developed in particular for low-resourced languages and challenging acoustic conditions.
- EU-BRIDGE
The EU-BRIDGE project, funded by the European Union, aims at
developing automatic transcription and translation technology that will
permit the development of innovative multimedia captioning and translation
services of audiovisual documents between European and non-European
languages. The project will provide streaming technology that can convert
speech from lectures, meetings, and telephone conversations into the text
in another language.
- Quaero Automatic multimedia content processing
Quaero is a collaborative research and development program, centered at developing multimedia and
multilingual indexing and management tools for professional and general public applications
such as the automatic analysis, classification, extraction and exploitation of information.
The research aims to facilitate the extraction of information in unlimited quantities of multimedia
and multilingual documents, including written texts, speech and music audio files, and images and videos.
Quaero was created to respond to new needs for the general public and professional use, and new challenges
in multimedia content analysis resulting from the explosion of various information types and sources in digital form,
available to everyone via personal computers, television and handheld terminals.
- BOLT Broad Operational Language Translation
The DARPA BOLT program seeks to accurately translate Mandarin Chinese and
multiple dialects of Arabic into English from all types of media,
specifically focused on the challenging task of informal conversational
speech, email text and instant messaging.
- SIGNSPEAK Scientific understanding and vision-based technological development for continuous sign language recognition and translation
Deaf communities revolve around sign languages as they are their natural means of communication.
Although deaf, hard of hearing and hearing signers can communicate without boundaries amongst themselves,
there is a serious challenge for the deaf community in trying to integrate into educational,
social and work environments, as the vast majority of Europeans do not have signing skills.
The overall goal of SIGNSPEAK is to develop a new vision-based technology for translating continuous sign language to text,
in order to improve the communication between deaf and hearing communities.
-
GALE Global Autonomous Language Exploitation
The goal of the DARPA GALE program is to produce a system that is able to automatically take multilingual newscasts,
text documents, and other forms of communication, and make their information available to human queries.
GALE has three major technical challenges: Automatic speech recognition, to process audio data,
machine translation, to translate non-English data, and distillation, to extract the most useful pieces
of information related to a given query.
- LUNA Spoken language understanding in multilingual communication systems
The LUNA project is focused on the problem of real-time understanding of spontaneous speech in the context of advanced telecom services.
The main objective of LUNA is the creation of a robust natural spoken language understanding toolkit for multilingual dialogue services,
able to carry out human-computer communication with a good degree of user satisfaction.
The vision of LUNA is to improve current automated telephone systems allowing easy human-machine interactions through spontaneous and
unconstrained speech, replacing menu-driven voice recognition. The project aims to enhance the users' experience,
helping callers in using vocal services quickly and accurately.
- TRAMES Traduction Automatique par Méthodes Statistiques
The aim of TRAMES is to develop an automatic translation system capable of
processing documents from various domains, such as written text or radio/television
and conversational speech transcripts, and producing corresponding French translations.
A data-driven machine-translation system developed at RWTH/i6 is interfaced with a graphical frontend
(developed by Bertin Technologies) that displays the MT system output, such as N-best translation hypotheses,
word alignments, confidence measures etc. A large-scale translation engine is designed to cover
Arabic-French as primary language pair.
- IRMA Image Retrieval in Medical Applications
The RWTH IRMA project is a joint project of the Institute of Medical
Informatics, the Department of Diagnostic Radiology,
and Lehrstuhl für Informatik 6. The goal of this project is the
realization of a content-based image retrieval system suited for use
in daily medical routine.
- DFG Project Statistical Methods for Written Language Translation
This project aims at the development and improvement of statistical
machine translation. The following problems are tackled: large
vocabulary translation, improvement of statistical alignment and
lexicon models, integration of mono- and bilingual grammars and
morphological analysis, and adaption and improvement of training and
search algorithms for statistical machine translation. The
project is supported by Deutsche Forschungsgemeinschaft (DFG).
- DFG Project Statistical Modeling for Image Object Recognition
The aim of the project is to investigate suitable statistical models for
image object recognition on three levels: (1) modeling of object
appearance using maximum entropy models; (2) modeling of the
variability of image objects using hidden Markov models; (3) modeling
of complex scenes using holistic approaches.
The project is supported by Deutsche Forschungsgemeinschaft (DFG).
- TC-STAR Technology and Corpora for Speech to Speech Translation
TC-STAR was a concentrated six year effort for advanced research in all
core technologies for speech to speech translation: speech
recognition, translation, and synthesis. The project targeted a
selection of unconstrained conversational speech domains
i.e. broadcast news and speeches and a few languages relevant
for Europe's economy and society: Chinese, European English and
European Spanish. The technical challenges and objectives
of the project focused on the development of new algorithms
and methods, integrating relevant human knowledge which is
available at translation time into a data-driven
framework. Examples of such new approaches are the integration of
linguistic knowledge in the statistical approach of spoken language
translation, the statistical modelling of pronunciation of
unconstrained conversational speech in automatic speech recognition,
and new acoustic and prosodic models for generating expressive speech
in speech synthesis.
TC-STAR was supported by the European Union.
- DFG Project Structured Acoustic Models for Speech Recognition
Within this project a better structuring of the acoustic models for
automatic speech recognition systems was investigated. Speech
signals are affected by many variable factors like background
noises, distortions in the transmission channel and speaker
characteristics. The goal of the project was to improve the
recognition by investigating and optimizing
methods that allow a better adaptation to - or suppression of these
undesired variabilities. These methods include: vocal tract length
normalization which reduces the speaker dependent variability of the
spectrum by applying a spectral warping function, histogram based
transformations applied during feature extraction to increase the noise
robustness; and adaptation of the acoustic model to different speakers and
transmission channels based on maximum likelihood linear
regression. The project was supported by Deutsche
Forschungsgemeinschaft (DFG).
- TransType2 Computer-Assisted Translation
The aim of TransType2 is to develop a Computer-Assisted Translation (CAT)
system, which will help to meet the growing demand for high-quality
translation. The innovative solution proposed by TransType2 is to embed a
data-driven machine translation engine with an interactive translation
environment. In this way, the system combines the best of two
paradigms: the CAT paradigm, in which the human translator ensures
high-quality output; and the machine translation paradigm, in which the machine ensures
significant productivity gains. Another innovative feature of TransType2 is
that it will have two input modalities: text and speech. Six
different versions of the system will be developed for English,
French, Spanish and German. To ensure that TransType2 corresponds to the
translators' needs, two professional translation agencies will
evaluate successive prototypes. TransType2 is supported by the
European Union.
- LC-STAR Lexica and Corpora for Speech-to-Speech Translation Technologies
The objective of the LC-STAR is to improve human-to-human and man-machine communication in multilingual environments. The project aims to create lexica
and corpora needed for speech-to-speech translation. Within LC-STAR, quasi industrial standards for those language resources will be established,
lexica for 12 languages and text corpora for 3 languages will be created. A speech to speech translation demonstrator for the three languages English,
Spanish and Catalan will be developed. The Lehrstuhl für Informatik 6 will focus on the investigation of speech centered translation technologies
focusing on requirements concerning language resources and the creation of lexica for speech recognition in German.
LC-STAR is supported by the European Union.
- PF-STAR Preparing future multisensorial interaction research
The PF-STAR project intends to contribute to establish future
activities in the field of multisensorial and multilingual
communication (interface technologies) on firmer bases by providing
technological baselines, comparative evaluations, and assessment of
prospects of core technologies, which future research and development
efforts can build from.
To this end, the project will address three crucial areas:
technologies for speech-to-speech translation, the detection and
expressions of emotional states, and core speech technologies for
children.
For each of them, promising technologies/approaches will be selected,
further developed and aligned towards common baselines. The results
will be assessed and evaluated with respect
to both their performances and future prospects.
To maximise the impact, the duration of the project is limited to 24
months, and the workplan has been designed to delivered results in two
stages: at mid-project term (month 14), and at the end of the
project. This will permit to make relevant results available as soon
as possible, and in particular on time for them to be used during the
preparatory phase of the first call of FP6.
The Lehrstuhl für Informatik 6 is involved in the comparative
evaluation and further
development of speech translation technologies. The statistical
approach is to be compared to an interlingua based approach. After the
evaluation phase, the two approaches are to be further developed and
aligned towards common baselines. PF-STAR is supported by the
European Union.
- TC-STAR_P Preparatory action for the project "Technology and Corpora for Speech to Speech Translation"
The objective of the TC-STAR_P project was to prepare a future integrated
project named "Technology and Corpora for Speech to Speech
Translation" (TC-STAR), which will be proposed under the 6th Framework
Programme and will aim at making speech to speech translation
real. TC-STAR_P was driven by industrial requirements and
involved industry key actors active in the development of speech
to speech translation systems
and components, academic research institutions active in speech to speech translation systems
and components research, infrastructure centres active in the
development of language resources for speech to speech
translation components and small medium enterprisess using
the provided technologies. Roadmaps for the development of speech to speech translation were
prepared, and further key actors from the industrial, research and
infrastructure groups, as well as small medium enterprises
working with speech to speech translation applications,
have been involved. A new organisational model was
developed. TC-STAR_P was supported by the
European Union.
- CORETEX
Nowadays commercial speech recognition systems
work well for a very specific
task and language. However, they are not able to adapt to new domains,
acoustic environments and languages. The objectives of the CORETEX
project were to develop generic speech recognition technology
that works well for a wide range of tasks with essentially no exposure to
task specific data and to develop methods for rapid porting to new domains
and languages with limited, inaccurately or untranscribed training data.
Another objective was to investigate techniques to produce an enriched
symbolic speech transcription with extra information for higher level
(symbolic) processing and to explore methods to use contemporary and/or
topic-related texts to improve language models, and for automatic
pronunciation generation for vocabulary extension.
We began with first investigations in unsupervised training,
i.e. train a speech recognition system for a new task without
dedicated transcribed training data for this specific task. One
problem with genericity and portability was the recognition
vocabulary. When shifting to a new task, a lot of work has to be
done to manually build phonetic transcriptions for new words. We
developed a method for automatically determine the phonetic
transcription. Further we built a
system to segment recorded broadcast shows into parts which can be
handled by the speech recognition system. CORETEX was supported by the
European Union.
- VERBMOBIL II
VERBMOBIL II is a speaker-independent and bidirectional
speech-to-speech translation system for spontaneous dialogues in mobile
situations. It recognizes spoken input, analyses and translates it,
and finally utters the translation. The multi-lingual system handles
dialogues in three business-oriented domains with context-sensitive
translation between three languages (German, English, and
Japanese).
Within this project, the Lehrstuhl für Informatik 6
performed research on both speech recognition and
translation. For both tasks, statistical methods were used and
self-contained software modules were developed and integrated into the
final prototype system. For the speech recognition part we developed
efficient search algorithms which perform a real time operation. In
the end-to-end evaluation, the statistical machine translation
significantly outperformed the competing translation approaches such
as classical transfer-based translation or example-based translation.
Verbmobil II was funded by the Bildungsministerium für
Bildung and Forschung (BMBF).
- EuTrans
Machine translation has been receiving considerable attention for a long time,
because of its great industrial and social interest. The focus of the
EuTrans project was the development and evaluation of example-based translation
techniques for text and speech input. Our institute contributed acoustic models
for the recognition of Italian telephone speech and analyzed different
statistical translation techniques.
EuTrans was supported by the European Union.
- ADVISOR
ADVISOR is a digital tool box for content analysis and rapid retrieval of
videos. It aims at pushing the
state-of-the-art in video annotation and retrieval technologies to the
extent that formal information from videos are extracted (semi)
automatically. A digital tool box for content analysis and rapid
retrieval of videos brings together audio and video indexing, manual
annotation and a search engine.
In this project, our institute provided an automatic segmentation algorithm
for MPEG audio streams and a large vocabulary continuous real-time speech
recognition system for transcribing the speech segments. The recognizer output
is enriched with confidence measures for information retrieval purposes.
ADVISOR was supported by the European Union, with
additional funding from the companies and organizations undertaking the work.
- GIZA++
The aim of this project was the development of a
publicly available baseline toolkit for statistical
machine translation. We extended an existing simple
toolkit (GIZA) by implementing training algorithms
for statistical translation models.
This project was supported by the
National Science Foundation (NSF).