The Machine Learning and Human Language Technology Group is looking for Bachelor/Master students for the following topics:
Classification error bounds:
- Framework: Bayes decision theory
- Objective: understand the (practical) error rate from first principles
- Area: basics of machine learning and information theory/statistics
- Approach: computer simulations and (ideally) analytic calculations
Unsupervised training (in the strict sense):
- Build an ASR system with the assumption of a perfect language model and huge amounts of speech data without transcriptions
- Challenges:
- define a suitable training criterion
- the relation to the classification error
- the computational complexity
Some relations to pre-training, semi-supervised training and decipherment (homophonic substitutions)
Large language models (LLM):
Possible directions:
- Mathematical improvements and refinements
- Supervised fine-tuning
- Reward modeling and reinforcement learning
- Retrieval augmented generation (RAG)
- Specific variant: LLM for dialog
Data and (real-life) tasks: public and AppTek
Language models for ASR:
Measure WER as a function of language model perplexity:
- Acoustic models: hybrid HMM, CTC, transducer, attention
- Language models: count-based, LSTM RNN, transformer
To verify: no search errors
Verifying training speech data:
- Challenge: improve existing ASR by filtering speech training data
- Approach: use combination of WER and recognition scores on transcribed data
Some relation to semi-supervised training
Extending seq-to-seq architectures (attention, FST/HMM):
Challenge: improve existing high-performance systems
- Attention: strict monotonicity
- Segmentation and streaming (as opposed to batch model)
- FST/HMM: combination with attention-like structures
Handwriting text recognition (HTR, 'OCR'):
Transfer concepts from ASR to HTR
- Starting point: RWTH's FST-HMM engine for ASR
- Switch from acoustic to image features
- Define/select suitable HWR tasks (English, Arabic, Chinese, etc.)
Ideally: cooperation with UP Valencia
More details about the topics can be found
here
To apply, please send an email to:
Mohammad Zeineldeen
zeineldeen [-at-] cs.rwth-aachen.de
with the followings included:
- Short CV (Lebenslauf)
- Transcript of Records (Notenspiegel) of RWTH Aachen University
(if it is your first semester at RWTH, then the transcript of your previous university)
- Your current status: degree program, semester
- Description of any experience (courses, projects, industry work, etc.) related to our field (machine learning, human language technology)
We accept applications any time throughout each semester.
|