RWTH ASR - The RWTH Aachen University Speech Recognition System
RWTH ASR (short "RASR") is a software package containing a speech recognition decoder together with tools for the development of acoustic models, for use in speech recognition systems. It has been developed by the Human Language Technology and Pattern Recognition Group at the RWTH Aachen University since 2001. Speech recognition systems developed using this framework have been applied successfully in several international research projects and corresponding evaluations.
RASR consists of several libraries and tools written in C++. Currently, Linux (x86 and x86-64) and Mac OS X (Intel) platforms are supported.
- decoder for large vocabulary continuous speech recognition
- word conditioned tree search (supporting across-word models)
- optimized HMM emission probability calculation using SIMD instructions
- refined acoustic pruning using language model lookahead
- word lattice generation
- feature extraction
- a flexible framework for data processing:
- MFCC features
- PLP features
- Gammatone features
- voicedness feature
- vocal tract length normalization (VTLN)
- support for several feature dimension reduction methods (e.g. LDA, PCA)
- easy implementation of new features as well as easy integration of external features using Flow networks
- acoustic modeling
- Gaussian mixture distributions for HMM emission probabilities
- phoneme in triphone context (or shorter context)
- across-word context dependency of phonemes
- allophone parameter tying using phonetic decision
trees (classification and regression trees, CART)
- globally pooled diagonal covariance matrix (other types of covariance modelling are possible, but not fully tested)
- maximum likelihood training
- discriminative training (minimum phone error (MPE) criterion)
- linear algebra support using LAPACK, BLAS
- language modeling
- support for language models in ARPA format
- weighted grammars (weighted finite state automaton)
- neural networks (new in v0.6)
- training of arbitrarily deep feed-forward networks
- CUDA support for running on GPUs
- OpenMP support for running on CPUs
- variety of activation functions, training criteria and optimization algorithms
- integration in feature extraction pipeline ("Tandem approach")
- integration in search and lattice processing pipeline ("Hybrid NN/HMM approach")
- speaker adaptation
- Constrained MLLR (CMLLR, "feature space MLLR", fMLLR)
- Unsupervised maximum likelihood linear regression mean adaptation (MLLR)
- speaker / segment clustering using Bayesian Information Criterion (BIC) as stop criterion
- lattice processing
- n-best list generation
- confusion network generation and decoding
- lattice rescoring
- lattice based system combination
- input / output formats
- nearly all input and output data is in easily process-able XML or plain text formats
- converter tools for the generation of NIST file formats are included
- HTK lattice format
- converter tools for HTK models
The development of RASR is ongoing. A Manual is available in the RASR Manual Wiki. Access to the wiki requires registration.
Publications about the theoretical foundations and methods used can be found in the publications page. The software package is described in detail in Rybach et al. The RWTH Aachen University Open Source Speech Recognition System. Interspeech 2009.
A short introduction is given in these slides.
Please post questions in the support forum.
RASR is available only in source form. See the included README for build instructions.
A set of installed tools and libraries is required (Debian package name given in brackets):
- GCC 4.0 <= version <= 4.8 (gcc, g++)
- GNU Bison (bison)
- GNU Make (make)
- libxml2 (libxml2, libxml2-dev)
- libsndfile (libsndfile1, libsndfile1-dev)
- LAPACK (lapack3, lapack3-dev)
- BLAS (refblas3, refblas3-dev)
RASR is free software; it can be redistributed and/or modified under the terms of the RWTH ASR License. This license includes free usage for non-commercial purposes as long as any changes made to the original software are published under the terms of the same license. Other licenses can be requested.
Remark: No acoustic or language models are included.
To download the software, you have to accept the license terms. Please fill out the form. The information submitted is only for internal usage and will not be given to third parties.
To demonstrate a large vocabulary system we offer the following models (in a binary format) developed for our EPPS English system together with a ready-to-use one-pass recognition setup:
The acoustic model was trained using the TC-STAR English Training Corpus.
The language model was trained using the Final Text Editions provided by the European Parliament and the transcriptions of the acoustic training data.
- acoustic model (triphones, 900K densities),
- 4-gram language model (7.5M multi-grams) for a vocabulary of 60K words
All offered materials may be used for research purposes. Any commercial use is prohibited.
Whole or partial distribution of the data provided is not allowed.
Publications of results obtained through the use of original or
modified versions of the data have to cite the authors by refering to the following
J. Lööf, C. Gollan, S. Hahn, G. Heigold, B. Hoffmeister, C. Plahl,
D. Rybach, R. Schlüter, and H. Ney: "The RWTH 2007 TC-STAR
Evaluation System for European English and Spanish". In
Interspeech 2007, pages 2145-2148, Antwerp, Belgium, August, 2007.
D. Rybach, S. Hahn, P. Lehnen, D. Nolden, M. Sundermeyer, Z. Tüske, S.
Wiesler, R. Schlüter, and H. Ney: "RASR - The RWTH Aachen University
Open Source Speech Recognition Toolkit". In IEEE Automatic Speech
Recognition and Understanding Workshop (ASRU), Hawaii, USA, December
To download the demo system, please fill out the form below. We will send you an email about how to proceed with the download.