Seminar "Large Scale Language Models and Generative Pretrained Transformers"

In the Summer Semester 2023, the Lehrstuhl Informatik 6 will host a seminar entitled "Large Scale Language Models and Generative Pretrained Transformers" for the Master level.

Registration for the seminar

Registration for the seminar is only possible online via the central registration page.

Prerequisites for Participation in the Seminar

General Goals of the Seminar

The goal of the seminar is to autonomously acquire knowledge and critical comprehension of an assigned topic, and present this topic both in writing and verbally.

This includes:

Seminar Format and Important Dates

The seminar will be started with a kick-off meeting, which will take place on 17.03.2023. Deatils are communicated directly to the seminar participants selected in the central registration.

Please note the following deadlines during the seminar:

Note: failure to comply with the ethical guidelines, failure to meet deadlines, absence without permission from compulsory sessions (presentations and preliminary meeting as announced by email to each participating student), or dropping out of the seminar after more than 3 weeks after the preliminary meeting/topic distribution results in the grade 5.0/not appeared.

The deadline for de-registration from the seminar is TBA, i.e. within three weeks after the distribution of the topics. After this deadline, seminar participation is confirmed and will be graded.



Topics, Initial References Defining the Topics, Participants, and Supervisors

In general, selected topics from the following general areas of Human Language Technology and Machine Learning will be offered: Below, you find exemplary topics. However, note that topics are subject to change/updates. The final topics will be presented in a kick-off meeting which will be announced to the seminar participants selected in the central registration for the seminar .
  1. Large Masked Transformer Language Models - "comparison/overview" of available/published LLMs (Student: Sakharov, Supervisor: Yang)
    Initial References:

    In this topic, the student is expected to conduct a survey which compares large language models, esp. non-autoregressive ones, highlighting the differences in e.g. the amount of data, the model parameter count, training hardware, model topology, training criterion, etc. A successful study will pave the way in understanding the development of large language model, as well as future trends.

  2. Large Transformer Language Models - "comaparison/overview" of available/published LLMs (Students: Lavronenko, Shiqerukaj, and Vierling, Supervisor: Berger)
    Initial References:


    Similar to the topic above (N. 1) but for autoregressive LLMs.

  3. From Small Amounts of Data to HUGE Amounts of Data (Student: Kotiyal, Supervisor: Yang)
    Initial References:

    Scaling up language models gives consistent improvements on a wide range of tasks. While some metrics, as for example perplexity, scale predictably with model size, other metrics suddenly jump substantially with larger model sizes. This seminar topic should give an overview of these different scaling behaviours of large language models on different downstream tasks

  4. Scaling Model Training on GPUs to over 100B Parameter Models (Student: Phan, Supervisor: Rossenbach)
    Initial References:

    Scaling up language models has shown significant accuracy gains on a wide range of tasks. But training models with over 100B parameters requires to distribute training over a large number of different nodes with multiple GPUs and to fit these models into limited device memory. This seminar topic should give an overview of different approaches how to address these challenges allowing to train models like GPT-3.

  5. Scaling Laws for Large Language Models (Student: Marxen, Supervisor: Yang)
    Initial References:

    In addition to the raw size of the model, also the number of tokens seen during the training influences the performance of large language models. Thus, given a fixed training budget the question arises how to allocate this budget towards training longer or larger models. The purpose of this seminar is to discuss different possible answers to solve this tradeoff.

  6. Large Language Models for Dialog (Student: Oberlaender, Supervisor: Thulke)
    Initial References:

    Large language models are usually trained on generic text trained from the internet. Building models for open domain dialog applications like ChatGPT requires models that are able to capture the discourse structure and are able to produce coherent responses. This seminar should give an overview of different approaches to build or adept large language models for dialog applications.

  7. Large Encoder-Decoder-based Language Models (Student: Bhattacherjee, Supervisor: Rossenbach)
    Initial References:

    Most LLMs like GPT or BERT only use a single stack of the original transformers architecture. In contrast, there are several models like T5 or BART that use an encoder-decoder architecture to separately process the input and output of the model. This seminar topic should give an overview of these models and discuss their advantages and disadvantages.

  8. Sparse Experts Models (Student: Nikolskyy, Supervisor: Vieting)
    Initial References:

    In standard dense neural network models all parameters are utilised to process an input. A more efficient way would be to only use the parameters that are actually relevant to the current input reducing the number of unnecessary computations. One way to achieve this are sparsely activated expert models. The goal of this seminar topic is to give an overview of these sparse expert models in the context of large language models.

  9. Retrieval-based Large Language Models (Student: Schmitt, Supervisor: Thulke)
    Initial References:

    While large language models show impressive capabilities in reproducing factual knowledge from the training data, they still produce wrong information for domains not covered well-enough in the training data or for facts that change over time. A potential solution is to enable models to retrieve knowledge from a large corpus or even the web. This seminar topic should give an overview of different approaches of retrieval-based language models and should discuss their advantages and disadvantages.

  10. Unsupervised Training of Speech Representations (Student: Ji, Supervisor: Vieting)
    Initial References:

    While text-based LLMs are trained on large amounts of text data, this idea can also be transferred to speech models which are then trained on audio data. This has a large potential for applications like speech recognition where labeled training data is typically costly to obtain. The goal of this seminar work is to give an overview of models and training criteria that are used to pre-train models that use speech as input.

  11. Weakly Supervised Training for Speech Models (Student: Filip, Supervisor: Vieting)
    Initial References:

    Explores the capcabilities of large-scale supervised pre-training for speech recognition. In addition to the main ASR task, also additional tasks such as speech translation, spoken language and voice activity detection are considered during training.

  12. Reinforcement Learning with Human Feedback for LLMs (Student: Zhou, Supervisor: Gao)
    Initial References:

    Large language models have achieved impressive fluency and smoothness in the generated outputs, however the synthetic errors and hallucinations still occasionally happen. In this topic, the student is to study the reinforcement learning with human in the loop, which makes the model outputs more well-behaved.

  13. Reasoning (Student: Bhatia, Supervisor: Zeineldeen)
    Initial References:

    One area where large language models still fall short are tasks requiring commonsense, arithmetic or symbolic reasoning. This seminar should cover methods to improve the reasoning capabilities of large language models with a focus on chain-of-though prompting.

  14. Multimodal vision and language models (Student: Erdenebayar, Supervisor: Zeineldeen)
    Initial References:

    This seminar topic should discuss approaches to extend large language model to a multimodal setting where the input consists of a mix of multiple modalities like vision and language.

  15. Article and Presentation Format

    The roughly 20-page article together with the slides (between 20 & 30) for the presentation should be prepared in LaTeX format. Presentations will consist of 30 to 40 minutes presentation time & 15 minutes discussion time. Document templates for both the article and the presentation slides are provided below along with links to LaTeX documentation available online. The article and the slides should be prepared in LaTeX format and submitted electronically in pdf format. Other formats will not be accepted.

    Detailed Guidelines:

    Some Tips:

    Time management is crucial for a successful seminar:
    Successful seminar articles/presentations typically:
    While reading papers, it might be useful to keep the following questions in mind:

    Contact

    Questions regarding the content of the assigned seminar topics should be directed to the respective topic's supervisors.

    General and administrative inquiries should be directed to:

    Tina Raissi
    RWTH Aachen University
    Lehrstuhl Informatik 6
    Theaterstrasse 35-39
    52074 Aachen

    Room 025
    Tel: 0241 80 21630

    E-Mail: raissi@cs.rwth-aachen.de