Seminar "Large Scale Language Models and Generative Pretrained Transformers"

In the Summer Semester 2023, the Lehrstuhl Informatik 6 will host a seminar entitled "Large Scale Language Models and Generative Pretrained Transformers" for the Master level.

Registration for the seminar

Registration for the seminar is only possible online via the central registration page.

Prerequisites for Participation in the Seminar

Bachelor degree
Attendance of the lectures Statistical Classification and Machine Learning, Automatic Speech Recognition, and/or Statistical Methods in Natural Language Processing, or evidence of equivalent knowledge is highly recommended.
For successful participants of the above lectures, seminar participation is guaranteed.

General Goals of the Seminar

The goal of the seminar is to autonomously acquire knowledge and critical comprehension of an assigned topic, and present this topic both in writing and verbally.

This includes:

Performance of a literature review based on the initial references provided to attain on overview of the assigned seminar topic
Reading, comprehending and critically analyzing the assigned and found articles.
Covering relevant publications within the scope of the assigned topic and the seminar format.
Describing the topic on this basis in a written report.
Preparing slides and presenting the topic in an oral presentation.
NewsKeeping the ethical guidelines for the authoring of academic work, which especially includes that all work used to prepare the seminar report and presentation is correctly cited.

Seminar Format and Important Dates

The seminar will be started with a kick-off meeting, which will take place on 17.03.2023. Deatils are communicated directly to the seminar participants selected in the central registration.

Please note the following deadlines during the seminar:

Proposals: initial proposals will be accepted up until 31.03.2023 by email to the seminar topic's supervisor. At this time, participants must arrange an appointment with the relevant supervisor. Revised proposals will be accepted up until 14.04.2023.
Article: PDF must be submitted until 28.05.2023 by email to the seminar topic's supervisor.
Presentation slides: PDF must be submitted by 15.06.2023 by email to the seminar topic's supervisor.
Trial presentations: finished by 30.06.2023.
Seminar presentations: Three-day block lecture August 9th-11th, ~(10h-17h)
Final (possibly corrected) articles and presentation slides: PDF must be submitted 4 weeks after the presentation date at the latest by email to the seminar topic's supervisor.
Compulsory attendance: in order to pass, participants must attend all presentation sessions.
Ethical Guidelines:The Computer Science Department of RWTH Aachen University has adopted ethical guidelines for the authoring of academic work , such as seminar reports. Each student has to comply with these guidelines. In this regard, you, as a seminar attendant, have to sign a declaration of compliance, in which you assert that your work complies with the guidelines, that all references used are properly cited, and that the report was done autonomously by yourself. We ask you do download the guidelines and submit the declaration together with your seminar report and talk to your supervisor. You also find a German version of the declaration you may use as well.

Note: failure to comply with the ethical guidelines, failure to meet deadlines, absence without permission from compulsory sessions (presentations and preliminary meeting as announced by email to each participating student), or dropping out of the seminar after more than 3 weeks after the preliminary meeting/topic distribution results in the grade 5.0/not appeared.

The deadline for de-registration from the seminar is TBA, i.e. within three weeks after the distribution of the topics. After this deadline, seminar participation is confirmed and will be graded.

Topics, Initial References Defining the Topics, Participants, and Supervisors

In general, selected topics from the following general areas of Human Language Technology and Machine Learning will be offered:

Automatic Speech Recognition;
Machine Translation;
Natural Language Understanding;
Machine Learning.

Below, you find exemplary topics. However, note that topics are subject to change/updates. The final topics will be presented in a kick-off meeting which will be announced to the seminar participants selected in the central registration for the seminar .

Large Masked Transformer Language Models - "comparison/overview" of available/published LLMs (Student: Sakharov, Supervisor: Yang)
Initial References:
- Devlin et al.: "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"
In this topic, the student is expected to conduct a survey which compares large language models, esp. non-autoregressive ones, highlighting the differences in e.g. the amount of data, the model parameter count, training hardware, model topology, training criterion, etc. A successful study will pave the way in understanding the development of large language model, as well as future trends.

Large Transformer Language Models - "comaparison/overview" of available/published LLMs (Students: Lavronenko, Shiqerukaj, and Vierling, Supervisor: Berger)
Initial References:
- Brown et al.: "Language Models are Few-Shot Learners"
Similar to the topic above (N. 1) but for autoregressive LLMs.

From Small Amounts of Data to HUGE Amounts of Data (Student: Kotiyal, Supervisor: Yang)
Initial References:
- Wei et al.: "Emergent Abilities of Large Language Models"
Scaling up language models gives consistent improvements on a wide range of tasks. While some metrics, as for example perplexity, scale predictably with model size, other metrics suddenly jump substantially with larger model sizes. This seminar topic should give an overview of these different scaling behaviours of large language models on different downstream tasks

Scaling Model Training on GPUs to over 100B Parameter Models (Student: Phan, Supervisor: Rossenbach)
Initial References:
- Rajbhandari et al.: "ZeRO: Memory Optimizations Toward Training Trillion Parameter Models"
Scaling up language models has shown significant accuracy gains on a wide range of tasks. But training models with over 100B parameters requires to distribute training over a large number of different nodes with multiple GPUs and to fit these models into limited device memory. This seminar topic should give an overview of different approaches how to address these challenges allowing to train models like GPT-3.

Scaling Laws for Large Language Models (Student: Marxen, Supervisor: Yang)
Initial References:
- Hoffmann et al.: "Training Compute-Optimal Large Language Models"
In addition to the raw size of the model, also the number of tokens seen during the training influences the performance of large language models. Thus, given a fixed training budget the question arises how to allocate this budget towards training longer or larger models. The purpose of this seminar is to discuss different possible answers to solve this tradeoff.

Large Language Models for Dialog (Student: Oberlaender, Supervisor: Thulke)
Initial References:
- Adiwardana et al.: "Towards a Human-like Open-Domain Chatbot"
Large language models are usually trained on generic text trained from the internet. Building models for open domain dialog applications like ChatGPT requires models that are able to capture the discourse structure and are able to produce coherent responses. This seminar should give an overview of different approaches to build or adept large language models for dialog applications.

Large Encoder-Decoder-based Language Models (Student: Bhattacherjee, Supervisor: Rossenbach)
Initial References:
- Raffel et al.: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
Most LLMs like GPT or BERT only use a single stack of the original transformers architecture. In contrast, there are several models like T5 or BART that use an encoder-decoder architecture to separately process the input and output of the model. This seminar topic should give an overview of these models and discuss their advantages and disadvantages.

Sparse Experts Models (Student: Nikolskyy, Supervisor: Vieting)
Initial References:
- Fedus et al.: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
In standard dense neural network models all parameters are utilised to process an input. A more efficient way would be to only use the parameters that are actually relevant to the current input reducing the number of unnecessary computations. One way to achieve this are sparsely activated expert models. The goal of this seminar topic is to give an overview of these sparse expert models in the context of large language models.

Retrieval-based Large Language Models (Student: Schmitt, Supervisor: Thulke)
Initial References:
- Borgeaud et al.: "Improving Language Models by Retrieving from Trillions of Tokens"
While large language models show impressive capabilities in reproducing factual knowledge from the training data, they still produce wrong information for domains not covered well-enough in the training data or for facts that change over time. A potential solution is to enable models to retrieve knowledge from a large corpus or even the web. This seminar topic should give an overview of different approaches of retrieval-based language models and should discuss their advantages and disadvantages.

Unsupervised Training of Speech Representations (Student: Ji, Supervisor: Vieting)
Initial References:
- Baevski et al.: "Wav2Vec 2.0: A Framework for Self-Supervised Learning of Speech Representations "
While text-based LLMs are trained on large amounts of text data, this idea can also be transferred to speech models which are then trained on audio data. This has a large potential for applications like speech recognition where labeled training data is typically costly to obtain. The goal of this seminar work is to give an overview of models and training criteria that are used to pre-train models that use speech as input.

Weakly Supervised Training for Speech Models (Student: Filip, Supervisor: Vieting)
Initial References:
- Redford et al.: "Robust Speech Recognition via Large-Scale Weak Supervision"
Explores the capcabilities of large-scale supervised pre-training for speech recognition. In addition to the main ASR task, also additional tasks such as speech translation, spoken language and voice activity detection are considered during training.

Reinforcement Learning with Human Feedback for LLMs (Student: Zhou, Supervisor: Gao)
Initial References:
- Ouyang et al.: "Training language models to follow instructions with human feedback"
Large language models have achieved impressive fluency and smoothness in the generated outputs, however the synthetic errors and hallucinations still occasionally happen. In this topic, the student is to study the reinforcement learning with human in the loop, which makes the model outputs more well-behaved.

Reasoning (Student: Bhatia, Supervisor: Zeineldeen)
Initial References:
- Wei et al.: "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models"
One area where large language models still fall short are tasks requiring commonsense, arithmetic or symbolic reasoning. This seminar should cover methods to improve the reasoning capabilities of large language models with a focus on chain-of-though prompting.

Multimodal vision and language models (Student: Erdenebayar, Supervisor: Zeineldeen)
Initial References:
- Alayrac et al.: "Flamingo: a Visual Language Model for Few-Shot Learning""
This seminar topic should discuss approaches to extend large language model to a multimodal setting where the input consists of a mix of multiple modalities like vision and language.

Article and Presentation Format

The article and the slides should be prepared in LaTeX format and submitted electronically in pdf format. Other formats will not be accepted.

Online LaTeX-Documentation:

Document Templates:

Article Template (51kB), contains the template and all necessary files in tar format (or here 10kB in zip format).
New Presentation Slide Template, a zip file containing the template and all necessary graphics as well as the institute’s style template. Note: We deactivated the RWTH and i6 logos in this version of the template since the seminar content is produced by students outside of i6.

Detailed Guidelines:

Take care to stay within your own topic. To this end participants should be aware of the other topics in the seminar. If applicable, cross-reference other articles and presentations.
Important: As part of the introduction, a slide should outline the most important literature used for the presentation. In addition, the presentation should clearly indicate which literature the particular elements of the presentation refer to.
Take notice of references to other topics in the seminar and discuss topics with one another!
Participants are expected to seek out additional literature on their topic. Assistance with the literature search is available at the faculty’s library. Access to literature is naturally also available at the Lehrstuhl Informatik 6 library.
Notation/Mathematical Formulas: consistent, correct notation is essential. When necessary, differing notation from various literature sources is to be modified or standardized in order to be clear and consistent. The lectures held by the Lehrstuhl Informatik 6 should provide a guide as to what appropriate notation should look like.
Tables must have titles (appearing above the table).
Figures must have captions (appearing below the figure).
The use of English is recommended and mandatory for the presentation slides. Nevertheless, the article and oral presentation may be done in German.
In the case that no adequate translation of an English technical term is available, the term should be used unchanged.
Completeness: acknowledge all literature and sources, thus following the ethical guidelines for the authoring of academic work.
Referencing must conform to the standard described in the article template.
Examples should be used to illustrate points.
Examples should be as complex as necessary but as simple as possible.
Slides should be used as presentation aids and not to replace the role of the presenter; specifically, slides should:

illustrate important points and relationships;
remind the audience (and the presenter) of important aspects and considerations;
give the audience an overview of the presentation.
Slides should not contain chunks of text or complicated sentences; rather they should consist of succinct statements and use consistent terminology.

Use illustrations where appropriate - a picture says a thousand words!
Abbreviations should be defined at the first usage in the manner demonstrated in the following example: "[...] at the Rheinisch-Westfälischen Technischen Hochschule (RWTH) there are [...]".
Usage of fonts, typefaces and colors in presentation slides must be consistent and appropriate. Such means should serve to clarify points or relationships, not be applied needlessly or at random.
Care should be taken when selecting font sizes for presentation slides (also within diagrams) to ensure legibility on a projector even for those seated far from the screen.

Some Tips:

Time management

The draft of the article and the trial presentation slides to be sent before the corresponding deadlines must be complete (in particular, they must contain a required number of pages). In principle, supervisors will not give any iterative feedback if an updated version is submitted after the deadlines. Do not miss the opportunities!

Successful seminar articles/presentations

Define the task clearly upfront (what is the problem? what are the input/output of the system?).
Give a short overview of the existing works on the topic.
Provide detailed descriptions of the state-of-the-art approach(es) with mathematical definitions using correct notations.
Include meaningful experimental results (extracted from the papers) with clear definitions of the datasets and the evaluation metrics.

reading papers

Why is this paper relevant for my topic: a historical piece of work? the state-of-the-art method? Does the paper help me understanding the topic better?
Do I really understand the paper? Can I describe how the method works without doubt? Can I explain the nature/dimension of all quantities in equations? Is this paper self-contained? or should I further read cited papers to get more background?
Is the paper content correct and consistent with other publications on the topic? If not: try to resolve discrepancies and broach such issues when discussing the paper in your article and presentation.
Are the experiments meaningful and convincing? Can I clearly describe the experiment and the evaluation metric?
How does this paper relate to other papers I have read? Is the same dataset used for evaluation?

Contact

content

supervisors

General

administrative

Tina Raissi
RWTH Aachen University
Lehrstuhl Informatik 6
Theaterstrasse 35-39
52074 Aachen

Room 025
Tel: 0241 80 21630

E-Mail: raissi@cs.rwth-aachen.de