Topics for Student Workers and Bachelor/Master Theses
The Machine Learning and Human Language Technology Group is looking for Bachelor/Master students for the following topics:

Classification error bounds:

  • Framework: Bayes decision theory
  • Objective: understand the (practical) error rate from first principles
  • Area: basics of machine learning and information theory/statistics
  • Approach: computer simulations and (ideally) analytic calculations

Unsupervised training (in the strict sense):

  • Build an ASR system with the assumption of a perfect language model and huge amounts of speech data without transcriptions
  • Challenges:
    • define a suitable training criterion
    • the relation to the classification error
    • the computational complexity
  • Some relations to pre-training, semi-supervised training and decipherment (homophonic substitutions)

Large language models (LLM):

Possible directions:
  • Mathematical improvements and refinements
  • Supervised fine-tuning
  • Reward modeling and reinforcement learning
  • Retrieval augmented generation (RAG)
  • Specific variant: LLM for dialog
Data and (real-life) tasks: public and AppTek

Language models for ASR:

Measure WER as a function of language model perplexity:
  • Acoustic models: hybrid HMM, CTC, transducer, attention
  • Language models: count-based, LSTM RNN, transformer
To verify: no search errors

Verifying training speech data:

  • Challenge: improve existing ASR by filtering speech training data
  • Approach: use combination of WER and recognition scores on transcribed data
Some relation to semi-supervised training

Extending seq-to-seq architectures (attention, FST/HMM):

Challenge: improve existing high-performance systems
  • Attention: strict monotonicity
  • Segmentation and streaming (as opposed to batch model)
  • FST/HMM: combination with attention-like structures

Handwriting text recognition (HTR, 'OCR'):

Transfer concepts from ASR to HTR
  • Starting point: RWTH's FST-HMM engine for ASR
  • Switch from acoustic to image features
  • Define/select suitable HWR tasks (English, Arabic, Chinese, etc.)
Ideally: cooperation with UP Valencia

More details about the topics can be found here


To apply, please send an email to:
Mohammad Zeineldeen
zeineldeen [-at-] cs.rwth-aachen.de
with the followings included:

  • Short CV (Lebenslauf)
  • Transcript of Records (Notenspiegel) of RWTH Aachen University
    (if it is your first semester at RWTH, then the transcript of your previous university)
  • Your current status: degree program, semester
  • Description of any experience (courses, projects, industry work, etc.) related to our field (machine learning, human language technology)

We accept applications any time throughout each semester.