David Thulke
Hi, I am PhD student within the Human Language Technology Group
at the Chair of Computer Science 6 of the RWTH Aachen University supervised by Prof. Dr.-Ing. Hermann Ney since January 2020. Additionally, I work as a language processing scientist at AppTek.
My personal research interests include:
- Retrieval Augmented Generation
- Pretraining of (Large) Language Models
- Named Entity Recognition
Other links:
You can find me in room 6123 of our department, call me at +49 241 80 21625 or write an e-mail to <surname>@hltpr.rwth-aachen.de
Publications
-
D. Thulke, J. Kemmler, C. Dugast, and H. Ney. Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions. In Proceedings of the 2nd Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2025), Vienna, Austria, July 2025.
-
K. Le-Duc, D. Thulke, H. Tran, L. Vo-Dang, K. Nguyen, T. S. Hy, and R. Schlüter. Medical Spoken Named Entity Recognition. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track) (NAACL 2025 Industry Track), pages 724-783, Albuquerque, New Mexico, April 2025.
-
D. Thulke, Y. Gao, R. Jalota, C. Dugast, and H. Ney. Prompting and Fine-Tuning of Small LLMs for Length-Controllable Telephone Call Summarization. In 2nd International Conference on Foundation and Large Language Models (FLLM), pages 305-312, Dubai, UAE, November 2024.
-
M. Benaicha, D. Thulke, and M. A. T. Turan. Leveraging Cross-Lingual Transfer Learning in Spoken Named Entity Recognition Systems. In Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024), pages 98-105, Vienna, Austria, September 2024.
-
David Thulke, Yingbo Gao, Petrus Pelser, Rein Brune, Rricha Jalota, Floris Fok, Michael Ramos, Ian van Wyk, Abdallah Nasir, Hayden Goldstein, Taylor Tragemann, Katie Nguyen, Ariana Fowler, Andrew Stanco, Jon Gabriel, Jordan Taylor, Dean Moro, Evgenii Tsymbalov, Juliette de Waal, Evgeny Matusov, Mudar Yaghi, Mohammad Shihadah, Hermann Ney, Christian Dugast, Jonathan Dotan, and Daniel Erasmus. ClimateGPT: Towards AI Synthesizing Interdisciplinary Research on Climate Change. , January, 2024.
Preprint arXiv:2401.09646.
-
V. A. K. Tran, D. Thulke, Y. Gao, C. Herold, and H. Ney. Does Joint Training Really Help Cascaded Speech Translation?. In Conference on Empirical Methods in Natural Language Processing (EMNLP), Abu Dhabi, December 2022.
-
B. Liao, D. Thulke, S. Hewavitharana, H. Ney, and C. Monz. Mask More and Mask Later: Efficient Pre-training of Masked Language Models by Disentangling the [MASK] Token. In Conference on Empirical Methods in Natural Language Processing (EMNLP), Abu Dhabi, December 2022.
-
N. Daheim, D. Thulke, C. Dugast, and H. Ney. Controllable Factuality in Document-Grounded Dialog Systems Using a Noisy Channel Model. In Conference on Empirical Methods in Natural Language Processing (EMNLP), Abu Dhabi, December 2022.
-
D. Thulke, N. Daheim, C. Dugast, and H. Ney. Adapting Document-Grounded Dialog Systems to Spoken Conversations using Data Augmentation and a Noisy Channel Model. In AAAI-22 10th Dialog System Technology Challenge (DSTC-10) Workshop, pages 9, Online, February 2022.
-
Y. Gao, D. Thulke, A. Gerstenberger, K. V. Tran, R. Schlüter, and H. Ney. On Sampling-Based Training Criteria for Neural Language Modeling. In Interspeech, August 2021.
-
N. Daheim, D. Thulke, C. Dugast, and H. Ney. Cascaded Span Extraction and Response Generation for Document-Grounded Dialog. In ACL-IJCNLP 2021 Workshop on Document-grounded Dialogue and Conversational QA, online, August 2021.
-
E. Tokarchuk, D. Thulke, W. Wang, C. Dugast, and H. Ney. Investigation on Data Adaptation Techniques for Neural Named Entity Recognition. In ACL-IJCNLP 2021 Student Research Workshop, online, August 2021.
-
D. Thulke, N. Daheim, C. Dugast, and H. Ney. Efficient Retrieval Augmented Generation from Unstructured Knowledge for Task-Oriented Dialog. In AAAI-21 9th Dialog System Technology Challenge (DSTC-9) Workshop, February 2021.
Full list of publications
of the chair.
Invited Talks and Panels
-
Keynote Speech - ClimateGPT: Towards Domain-Specific Large Language Models for Climate Change. At ClimateNLP: Natural Language Processing meets Climate Change, ACL 2024 Workshop, Bangkok, August 2024.
-
Panel - Matchmaking for Climate Policy, Information and Finance AI Solutions. At Bonn AI and Climate Expert Meeting, Bonn, July 2024.
-
Invited Talk - ClimateGPT and NLP for Climate. At Climate Analytics Acceleration Hub: Igniting Action & Finance with Innovation, Understanding Risk Global Forum, Himeji, June 2024.
-
Panel - NLP for climate solutions: retrieval augmented generation and other strategies for mananging climate information. At Accelerating Climate Change Action through Machine Learning, Applied Machine Learning Days (AMLD), Lausanne, March 2024.