In the summer term 2011 the
Lehrstuhl für Informatik 6 will host a seminar entitled:
Seminar "Selected Topics in Human Language Technology"
Registration for the seminar:
Registration
for the seminar
is only possible online via the registration
page provided by the
institute. A link can be found on the Computer
Science Department's homepage.
Prerequisites for participation in the seminar:
- Bachelor students: Einführung in das wissenschaftliche Arbeiten (Proseminar)
- Master students: Bachelor degree
- Diploma students: Vordiplom
- Attendance of the lectures Pattern Recognition and Neural
Networks, Speech Recognition or Statistical Methods in Natural Language
Processing, or evidence of equivalent knowledge.
- For successful participants of the above lectures, the possibility of a seminar
talk is guaranteed.
Seminar format and important dates:
Preparatory Meeting:
The preparatory meeting takes place on Friday, February 11, 2011, at
14h in the seminar room (Room 6124) of the Lehrstuhl fuer Informatik
6, participation is obligatory. In this meeting, the topics will be
presented and distributed on all participants.
- Proposals: initial proposals will be accepted up
until the start of the term
(April 4, 2011) at the Lehrstuhl für Informatik 6
office or by the relevant supervisor. At this time participants must
arrange an appointment with the relevant supervisor. Revised proposals
will be accepted up until two weeks
after the start of the term.
- Article: must be submitted at least 1 month prior to the trial presentation date
to either the Lehrstuhl für Informatik 6 office or the relevant
supervisor.
- Presentation slides: must be submitted at least 1 week prior to the trial presentation date
to either the Lehrstuhl für Informatik 6 office or the relevant
supervisor.
- Trial presentations: at least 2 weeks prior to the
actual presentation date; refer to the section on topics.
- Seminar presentations: the exact dates and plan for
the presentation block (expected to be around end of July/beginning of August 2011)
will be arranged and announced for the individual topics.
- Final (possibly corrected) articles and presentation slides:
must be submitted at the latest 2
weeks after the presentation date to either the Lehrstuhl für Informatik 6 office or the relevant supervisor.
- Compulsory attendance: in order to receive a
certificate participants must attend all presentation sessions.
- Ethical Guidelines:The Computer Science
Department of RWTH Aachen University has adopted ethical
guidelines for the authoring of academic work such as seminar
reports. Each student has to comply with these guidelines. In this
regard, you, as a seminar attendant, have to sign a declaration of
compliance, in which you assert that your work complies with the
guidelines, that all references used are properly cited, and that the
report was done autonomously by yourself. We ask you do download the guidelines
and submit the declaration
together with your seminar report and talk to your supervisor.
You also find a German version of the guidelines
and a German version of the declaration you may use as well.
Note: failure to meet deadlines, absence without
permission from compulsory sessions (presentations and preliminary
meeting as announced by email to each participating student), or
dropping out of the seminar after more than 3 weeks after the
preliminary meeting/topic distribution, i.e. after March 4, 2011, results in the grade 5.0/not appeared.
Topics, Relevant References and Participants:
The seminar topics will be presented and distributed to the participants during
the preparatory meeting announced above.
In this seminar, selected topics from Language Modeling will be
offered from the area, which is important for a wide range of human
language technologies:
- Neural Network based Language Models (NN; Supervisor: Martin
Sundermeyer)
Trial Presentation: NN
References:
- Y. Bengio, R. Ducharme, P. Vincent, C. Jauvain "A Neural
Probabilistic Language Model" (2003), in Journal of Machine
Learning Research 3 (2005), 1137-1155.
- H. Schwenk, J.-L. Gauvain "Training Neural Network Language
Models On Very Large Corpora" in Joint Conference HLT/EMNLP
2005, 201-208
- H. Schwenk, "Continuous Space Language Models" (2007) in
Computer Speech and Language (2007), 492-518
- T. Mikolov, "Recurrent Neural Network Based Language Model" in
Proceedings of Interspeech 2010, Makuhari, 1045-1048
- Random Forest Language Models (Paulus; Supervisor: Jörn
Wübker)
Trial Presentation NN:
References:
- Peng Xu, Frederick Jelinek, "Random Forests in Language
Modeling". In Proceedings of Empirical Methods in Natural
Language Processing (EMNLP) 2004, Barcelona, Spain, p. 325-332.
- Yi Su, Frederick Jelinek, Snajeev Khudanpur, "Large-scale Random
Forest Language Models for Speech Recognition". In Proceedings
of INTERSPEECH 2007, Antwerp, Belgium, p. 598-601.
- Ilya Oparin, Lori Lamel, Jean-Luc Gauvin, "Large-Scale Language
Modeling with Random Forests for Mandarin Chineses
Speech-to-Text". In IceTAL'10 Proceedings of the 7th
international conference on Advances in Natural Language
Processing 2010, Reykjavik, Iceland, p. 269-280, Springer.
- Lanuage Model Pruning (Alkhouli; Supervisor: Martin
Sundermeyer)
Trial Presentation NN:
References:
- K. Seymore, R. Rosenfeld, "Scalable Backoff Language Models"
in Proc. Fourth International Conference on Spoken Language (ICSLP)
vol. 1, pp. 232-235, Kyoto, Japan, 1996.
- R. Kneser, "Statistical Language Modeling Using A Variable
Context Length," in Proc. Fourth International Conference on Spoken Language (ICSLP)
vol. 1, pp. 494-497, Kyoto, Japan, 1996.
- A. Stolcke, "Entropy Based Pruning Of Backoff Language Models",
Proceedings DARPA Broadcast News Transcription and Understanding
Workshop, pp. 270-274, Lansdowne, VA, 1998
- V. Siivola, T. Hirsimäki, S. Virpioja "On Growing and Pruning
Kneser-Ney Smoothed N-gram Models" in IEEE Transactions on
Audio, Speech and Language Processing, vol 15 no 5, July 2007
- C. Chelba, T. Brants, W. Neveitt, P. Xu "Study On Interaction
Between Entropy Pruning And Kneser-Ney Smoothing" in Proceedings
of Interspeech 2010, Makuhari, 2422-2425
- Model M, Shrinkage based Models (NN; Supervisor: Arnaud
Dagnelies)
Trial Presentation NN:
References:
- S. Chen, "Shrinking Exponential Language Models" (2009)
- S. Chen, "Performance Prediction For Exponential Language
Models" (2009)
- S. Chen, "Enhanced Word Classing for Model M" (2010)
- R. Sarikaya, "Impact of Word Classing on Shrinkage-Based
Language Models" (2010)
- Language Model Adaptation (NN; Supervisor: Markus
Nussbaum-Thom)
Trial Presentation NN:
References:
- Liu, X., Gales, M. J. F., Woodland, P. C., "Context dependent
language model adaptation", In INTERSPEECH-2008, pp. 837-840.
- Liu, X., Gales, M. J. F., Woodland, P. C. "Use of contexts in
language model interpolation and adaptation", In
INTERSPEECH-2009, pp. 360-363.
- Xunying Liu, Mark J. F. Gales, Jim L. Hieronymus, Philip
C. Woodland, "Language model combination and adaptation using
weighted finite state transducers," ICASSP 2010, pp. 5390-5393.
- M. Bacchiani and B. Roark, "Unsupervised language model
adaptation," in Proc. ICASSP, 2003, pp. 224–227.
- P. Hsu and J. Glass, "Style & topic language model adaptation
using HMM-LDA," in Proc. of EMNLP, 2006, pp. 373–381.
- Discriminative Training of Language Models (Rafi; Supervisor: Markus
Nussbaum-Thom)
Trial Presentation NN:
References:
- P. Xu, D. Karakos and S. Khudanpur, "Self-Supervised
Discriminative Training of Statistical Language Models", in
Proceedings of the 2009 Automatic Speech Recognition and
Understanding Workshop (ASRU-2009), Merano, Italy, December
13-17, 2009
- Jyothi Preethi, Fosler-Lussier Eric, "Discriminative language
modeling using simulated ASR errors", In INTERSPEECH-2010,
pp. 1049-1052.
- Alumäe Tanel, Kurimo Mikko, "Efficient estimation of maximum
entropy language models with n-gram features: an SRILM
extension", In INTERSPEECH-2010, pp. 1820-1823.
- Long Distance Language Modeling (Dallmann; Supervisor: Arnaud
Dagnelies)
Trial Presentation NN:
References:
- Gao, Suzuki: "Capturing Long Distance Dependency in Language
Modeling: An Empirical Study" (2004)
- Siu, Ostendorf: "Variable N-Grams and Extensions for
Conversational Speech Language Modeling" (2000)
- Rosenfeld: "Adaptive statistical language modeling: a maximum
entropy approach. Ph.D. thesis, Carnegie Mellon University" (1994)
- Brun, Langlois, Smaili: "Improving Language Models by Using
Distant Information" (2007)
- Momtazi, Faubel, Klakow: "Within and Across Sentence Boundary
Language Model" (2010)
- In-domain Data Selection (Mangen; Supervisor: Jörn
Wübker)
Trial Presentation NN:
References:
- Robert Moore, William Lewis. "Intelligent Selection of Language
Model Training Data". In Proceedings of the ACL 2010 Conference
Short Papers, p. 220-224, Uppsala, Sweden, 11-16 July 2010.
- Jianfeng Gao, Joshua Goodman, Mingjing Li, Kai-Fu Lee: "Toward a
unified approach to statistical language modeling for
Chinese". ACM Transactions on Asian Language Information
Processing, Volume 1 Issue 1, March 2002, p. 3-33.
- Dietrich Klakow. "Selecting articles from the language model
training corpus". In Proc. of International Conference on
Acoustics, Speech and Signal Processing (ICASSP) 2000, June 5-9,
Istanbul, Turkey, vol 3, p. 1695-1698.
- Tied-Mixture language models in continuous space (NN; Supervisor:
Amr El-Desoky
Mousa)
Trial Presentation NN:
References:
- R. Sarikaya, M. Afify, B. Kingsbury, "Tied Mixture language
modeling in continuous space", In Proc. of NAACL-HLT 2009, pages
459-467, Boulder, Colorado, June 2009.
- R. Sarikaya, A. Emami, M. Afify, B. Ramabhadran , "Continuous
space language modeling techniques", In Proc. of ICASSP 2010 ,
pages 5186-5189, Dallas, Texas, USA, March 2010.
- Factored Language Models (Dolata; Supervisor: Amr El-Desoky
Mousa)
Trial Presentation NN:
References:
- K. Kirchhoff, J. Bilmes, K. Duh, "Factored language models
tutorial" , Technical Report, Department of Electrical Engineering,
University of Washington (UWEE), Feb 2008.
- Bilmes and K. Kirchhoff, "Factored language models and
generalized parallel backoff", In Proc. of NAACL-HLT 2003,
volume 2, pages 4-6, Edmonton, Canada, May 2003.
- A. E. Mousa, R. Schlüter, and H. Ney, "Hybrid Morphologically
Decomposed Factored Language Models for Arabic LVCSR", In
Proc. of NAACL HLT, pages 701-704, Los Angeles, California, USA,
June 2010.
Guidelines for the article and presentation:
The roughly 20-page article together with the slides (between 20 &
30) for the presentation should be prepared in LaTeX format.
Presentations will consist of 45 minutes presentation time & 15
minutes discussion time. Document templates for both the article and
the presentation slides are provided below along with links to LaTeX
documentation available online. The article and
the slides should be prepared in LaTeX format and submitted
electronically in pdf format. Other formats will not be accepted.
- Online LaTeX-Documentation:
- Guidelines for articles and presentation slides:
General:
- The aim of the seminar for the participants is to learn the
following:
- to tackle a topic and to expand knowledge
- to critically analyze the literature
- to hold a presentation
- Take notice of references
to other topics in the seminar and discuss topics with one
another!
- Take care to stay within your
own topic. To this end participants should be aware of the other
topics in the seminar. If applicable, cross-reference
other articles and presentations.
Specific:
- Important: As part of the introduction, a slide should
outline the most important literature used for the presentation. In
addition, the presentation should clearly indicate which literature the particular
elements of the presentation refer to.
- Take notice of references
to other topics in the seminar and discuss topics with one
another!
- Participants are expected to seek out additional literature on their
topic. Assistance with the literature search is available at the
facultys library. Access to literature is naturally also available at
the Lehrstuhl für Informatik 6 library.
- Notation/Mathematical
Formulas: consistent, correct notation
is essential. When necessary, differing notation from various
literature sources is to be modified or standardized in order to be
clear and consistent. The
lectures held by the Lehrstuhl für Informatik 6 should provide a
guide as to what appropriate notation should look like.
- Tables
must have titles (appearing above the table).
- Figures
must have captions (appearing below the figure).
- In the case that no adequate translation of an
English technical term is available, the term should be used unchanged.
- Articles and presentation slides can also be prepared in
English.
- Completeness:
acknowledge all literature and
sources.
- Referencing must conform to the standard
described in the article template.
- Examples should be used to illustrate points.
- Examples should be as complex as necessary but as simple
as possible.
- Slides should be used
as presentation aids and not to replace the role of the presenter;
specifically, slides should:
- illustrate important points and relationships;
- remind the audience (and the presenter) of important aspects
and considerations;
- give the audience an overview
of the presentation.
- Slides should not contain chunks of text or complicated
sentences; rather they should consist of succinct words and terms.
- Use illustrations
where appropriate - a picture says a thousand words!
- Abbreviations should be defined at the first usage in the manner
demonstrated in the following example: "[...] at the
Rheinisch-Westfälischen Technischen Hochschule (RWTH) there are
[...]".
- Take care to stay within your
own topic. To this end participants should be aware of the other topics in the
seminar. If applicable, cross-reference
other articles and presentations.
- Usage of fonts, typefaces and colors in presentation slides must
be consistent and appropriate. Such means should serve to clarify
points or relationships, not be applied needlessly or at random.
- Care should be taken when selecting fonts for presentation
slides (also within diagrams) to ensure legibility on a projector even
for those seated far from the screen.
Registration for the seminar:
Registration
for the seminar
is only possible online via the registration
page provided by the
institute.
A link can be found on the Computer Science Department's homepage.
Inquiries relating to organizational
aspects of the seminar should be directed to:
Dr. Ralf Schlüter
RWTH Aachen
Lehrstuhl für Informatik 6
Ahornstr. 55
52056 Aachen
Room 6125b (1. Etage E2)
Telephone: 0241 / 80 21 612
E-Mail: schlueter@cs.rwth-aachen.de