Seminar "Selected Topics in Human Language Technology and Pattern Recognition"
In the summer term 2012 the Lehrstuhl für Informatik 6 will host a seminar entitled "Selected Topics in Human Language Technology and Pattern Recognition".
Registration for the seminar
Registration for the seminar is only possible online via the registration
page provided by the department.
Prerequisites for participation in the seminar
- Bachelor students: Einführung in das wissenschaftliche Arbeiten (Proseminar)
- Master students: Bachelor degree
- Diploma students: Vordiplom
- Attendance of the lectures Pattern Recognition and Neural
Networks, Speech Recognition or Statistical Methods in Natural Language
Processing, or evidence of equivalent knowledge.
- For successful participants of the above lectures, the possibility of a seminar
talk is guaranteed.
Seminar format and important dates
The seminar generally takes
place in block mode around the end of the lecture period. Specific
dates will be arranged.
- Proposals: initial proposals will be accepted up
until the start of the term
(April 2, 2012) at the Lehrstuhl für Informatik 6
office or by the relevant supervisor. At this time participants must
arrange an appointment with the relevant supervisor. Revised proposals
will be accepted up until two weeks
after the start of the term.
- Article: must be submitted at least 1 month prior to the trial presentation date
to either the Lehrstuhl für Informatik 6 office or the relevant
supervisor.
- Presentation slides: must be submitted at least 1 week prior to the trial presentation date
to either the Lehrstuhl für Informatik 6 office or the relevant
supervisor.
- Trial presentations: at least 2 weeks prior to the
actual presentation date; refer to the section on topics.
- Seminar presentations: the exact dates and plan for
the presentation block
will be arranged and announced for the individual topics.
- Final (possibly corrected) articles and presentation slides:
must be submitted at the latest 4
weeks after the presentation date to either the Lehrstuhl für Informatik 6 office or the relevant supervisor.
- Compulsory attendance: in order to receive a
certificate participants must attend all presentation sessions.
- Ethical Guidelines:The Computer Science
Department of RWTH Aachen University has adopted ethical
guidelines for the authoring of academic work such as seminar
reports. Each student has to comply with these guidelines. In this
regard, you, as a seminar attendant, have to sign a declaration of
compliance, in which you assert that your work complies with the
guidelines, that all references used are properly cited, and that the
report was done autonomously by yourself. We ask you do download the guidelines
and submit the declaration
together with your seminar report and talk to your supervisor.
You also find a German version of the guidelines
and a German version of the declaration you may use as well.
Note: failure to meet deadlines, absence without
permission from compulsory sessions (presentations and preliminary
meeting as announced by email to each participating student), or
dropping out of the seminar after more than 3 weeks after the
preliminary meeting/topic distribution
results in the grade 5.0/not appeared.
Topics, relevant references and participants
Specific topics will be introduced at a preparatory meeting
in the seminar room at the Lehrstuhl für Informatik 6.
In general, selected topics from the following general areas of Human
Language Technology and Pattern Recognition will be offered:
- Automatic Speech Recognition;
- Machine Translation;
- Pattern Recognition.
Some possible topics, supervisors, and basic references:
- The IBM Translation Models (Kohnen; Supervisor: Christoph Schmidt)
References:
- Chapter 4 of P. Koehn: "Statistical Machine Translation," textbook, Cambridge University Press, January 2010.
- Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer: "The mathematics
of statistical machine translation: Parameter estimation," Computational Linguistics, 19(2):263-311, 1993.
- Franz Josef Och and Hermann Ney: "Improved statistical alignment models," Proceedings of the 38th
Annual Meeting of the Association for Computational Linguistics, 2000.
- Decoding for Phrase-Based Statistical Machine Translation (Rietig; Supervisor: Stephan Peitz)
References:
- Chapter 6 of P. Koehn: "Statistical Machine Translation," textbook, Cambridge University Press, January 2010.
- P. Koehn: "Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation Models,"
Proc. 6th Conference of the Association for Machine Translation in the Americas (AMTA),
Washington, DC, September 2004.
- R. C. Moore, C. Quirk: "Faster Beam-Search Decoding
for Phrasal Statistical Machine Translation,"
Proc. MT Summit XI, Copenhagen, Denmark, September 2007.
- R. Zens, H. Ney: "Improvements in Dynamic Programming Beam Search for Phrase-based Statistical Machine Translation,"
International Workshop on Spoken Language Translation (IWSLT),
Honolulu, Hawaii, October 2008.
- Machine Translation Evaluation (Kochanov; Supervisor: Markus Freitag)
References:
- Chapter 8 of P. Koehn: "Statistical Machine Translation," textbook, Cambridge University Press, January 2010.
- K. Papineni, S. Roukus, T. Ward and W. Zhu: "BLEU: a Method for Automatic Evaluation of Machine Translation," Proc. ACL, pp. 311-318, Philadelphia, PA, July 2002.
- Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, John Makhoul: "A Study of Translation Edit Rate with Targeted Human Annotation," In Proceedings AMTA, August 2006, pp. 223-231.
- A. Lavie, A. Agarwal: "METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments," Proc. 2nd Workshop on MT, ACL, pp.228-231, Prague, Czech Republic, June 2007.
- Phrase Table Smoothing for Statistical Machine Translation (NN; Supervisor: Saab Mansour)
References:
- R. Zens, H. Ney: "Improvements in phrase-based statistical machine translation," HLT-NAACL, 2004.
- G. Foster, R. Kuhn, and H. Johnson: "Phrasetable Smoothing for Statistical Machine Translation," EMNLP, 2006.
- B. Chen, R. Kuhn, G. Foster, and H. Johnson: "Unpacking and Transforming Feature Functions: New Ways to Smooth Phrase Tables," MT Summit, 2011.
- M. Huck, S. Mansour, S. Wiesler, and H. Ney: "Lexicon Models for Hierarchical Phrase-Based Machine Translation," International Workshop on Spoken Language Translation (IWSLT),
San Francisco, California, USA, December 2011.
- Domain Adaptation for Statistical Machine Translation (Erduran; Supervisor: Saab Mansour)
References:
- G. Foster and R. Kuhn: "Mixture-Model Adaptation for SMT," WMT, 2007.
- Moore and Lewis: "Intelligent Selection of Language Model Training Data," ACL, 2010.
- Axelrod et al.: "Domain Adaptation via Pseudo In-domain Data Selection," EMNLP, 2011.
- Matusoukas et al.: "Discriminative Corpus Weight Estimation for Machine Translation," EMNLP, 2009.
- Foster et al.: "Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation," EMNLP, 2010.
- Lambert et al.: "Investigations on translation model adaptation using monolingual data," WMT, 2011.
- Unsupervised Training for Statistical Machine Translation (Mroz; Supervisor: Malte Nuhn)
References:
- H. Schwenk: "Investigations on Large-scale Lightly-supervised Training for Statistical Machine Translation," IWSLT, 2008.
- H. Schwenk and J. Senellart: "Translation Model Adaptation for an Arabic/French News Translation System by Lightly-supervised Training," MT Summit, 2009.
- N. Ueffing: "Using Monolingual Source-Language Data to Improve MT Performance," IWSLT, 2006.
- Ueffing et al.: "Transductive learning for statistical machine translation," WMT, 2007.
- Knight, Nair, Rathod, and Yamada: "Unsupervised Analysis for Decipherment Problems," ACL, 2006.
- S. Ravi, K. Knight: "Deciphering Foreign Language," ACL, 2011.
- System Combination for Machine Translation (NN; Supervisor: Markus Freitag)
References:
- E. Matusov et al.: "System Combination for Machine Translation of Spoken and Written Language," IEEE Transactions on Audio, Speech and Language Processing, p.1222--1237, 2008.
- E. Matusov, N. Ueffing, and H. Ney: "Computing Consensus Translation from Multiple Machine Translation Systems Using Enhanced Hypotheses Alignment,"
Conference of the European Chapter of the Association for Computational Linguistics (EACL), Trento, Italy, April 2006.
- A. V. Rosti, N. F. Ayan, B. Xiang, S. Matsoukas, R. Schwartz, and B. Dorr.: "Combining Outputs from Multiple Machine Translation Systems,"
Proc. of NAACL-HLT 2007, Rochester, NY, USA, April 2007.
- Morphological and Syntactic Processing for Statistical Machine Translation (Schimitzek; Supervisor: Jan-Thorsten Peter)
References:
- Klaus Macherey, Andrew M. Dai, David Talbot, Ashok C. Popat, Franz Och: "Language-independent Compound Splitting with Morphological Operations," ACL-HLT, 2011.
- John DeNero, Jakob Uszkoreit: "Inducing Sentence Structure from Parallel Corpora for Reordering," Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP),, 2011.
- Karthik Visweswariah, Jiri Navratil, Jeffrey Sorensen, Vijil Chenthamarakshan, Nanda Kambhatla: "Syntax based reordering with automatically derived rules for improved statistical machine translation," Proceedings of the 23rd International Conference on Computational Linguistics, 2010, pp. 1119-1127.
- Dmitriy Genzel: “Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation," COLING, 2010.
- Discriminative Alignment for Machine Translation (NN; Supervisor: Christoph Schmidt)
References:
- Ben Taskar, Simon Lacoste-Julien, and Dan Klein:
"A Discriminative Matching Approach to Word Alignment,"
pp. 73--80,
Proc. Human Language Technology Conference
and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP),
Vancouver, Canada, October 2005.
- Abraham Ittycheriah and Salim Roukos:
"A Maximum Entropy Word Aligner for Arabic-English Machine Translation,"
pp. 89-96,
Proc. Human Language Technology Conference
and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP),
Vancouver, BC, Canada, October 2005.
- Phil Blunsom and Trevor Cohn:
"Discriminative Word Alignment with Conditional Random Fields,"
pp. 65-72,
Proc. 21st International Conference on Computational Linguistics
and 44th Annual Meeting of the Association for Computational Linguistics,
Sydney, Australia, July 2006.
- Discriminative Training for Machine Translation (Muehr; Supervisors: Jörn Wübker, Matthias Huck)
References:
- Ittycheriah, Roukos: "Direct Translation Model 2," HLT-NAACL, 2007.
- Bing, Ittycheriah: "Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation," ACL-HLT, 2011.
- Fei, Bing: "Feature-Rich Discriminative Phrase Rescoring for SMT," COLING, 2010.
- Reordering for Phrase-based Statistical Machine Translation (NN; Supervisor: Matthias Huck)
References:
- Chapter 5.4 and 6 of P. Koehn: "Statistical Machine Translation," textbook, Cambridge University Press, January 2010.
- R. Zens, H. Ney, T. Watanabe, and E. Sumita: "Reordering Constraints for Phrase-Based Statistical Machine Translation,"
International Conference on Computational Linguistics (CoLing), pages 205-211, Geneva, Switzerland, August 2004.
- Christoph Tillmann: "A unigram orientation model for statistical machine translation," Proceedings of HLT-NAACL 2004: Short Papers,
pages 101-104, 2004.
- Yaser Al-Onaizan and Kishore Papineni: "Distortion models for statistical machine translation," Proceedings
of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL (COLING/ACL), pages 529-536, Morristown, NJ, USA, 2006.
- Michel Galley, Christopher Manning: "A Simple and Effective Hierarchical Phrase Reordering Model," Proceedings of EMNLP, 2006.
- Hierarchical Phrase-based Machine Translation (NN; Supervisor: Matthias Huck)
References:
- David Chiang: "Hierarchical Phrase-Based Translation," Computational Linguistics, 33(2):201-228, June 2007.
- Liang Huang, David Chiang: "Forest Rescoring: Faster Decoding with Integrated Language Models," Proc. ACL, 2007.
- A. de Gispert, G. Iglesias, G. Blackwood, E.R. Banga and W. Byrne:
"Hierarchical Phrase-based Translation with Weighted Finite State Transducers and Shallow-N Grammars,"
Computational Linguistics, Volume 36, Number 3, pp. 505-533, 2010.
- Syntax-based Statistical Machine Translation (NN; Supervisor: Stephan Peitz)
References:
- D. Marcu, W. Wang, A. Echihabi, and K. Knight:
"SPMT: Statistical Machine Translation with Syntactified Target Language Phrases,"
pp. 44-52,
Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP),
Sydney, Australia, July 2006.
- S. DeNeefe, K. Knight, W. Wang, D. Marcu:
"What Can Syntax-Based MT Learn from Phrase-Based MT?"
Proc. Conference on Empirical Methods in Natural Language Processing
and Conference on Computational Natural Language Learning (EMNLP-CoNLL),
Prague, Czech Republic, June 2007.
- D. Chiang:
"Learning to Translate with Source and Target Syntax,"
Proc. of 48th Annual Meeting of the Association for Computational Linguistics (ACL),
Uppsala, Sweden, July 2010.
- Treelet Translation (NN; Supervisor: Jörn Wübker)
References:
- Menezes and Quirk: "Microsoft Research Treelet Translation System: IWSLT Evaluation," in Proceedings of the International Workshop on Spoken Language Translation, October 2005.
- Quirk, Menezes, and Cherry: "Dependency Treelet Translation: Syntactically Informed Phrasal SMT," in Proceedings of the ACL, June 2005.
- Menezes and Quirk: "Syntactic Models for Structural Word Insertion and Deletion during Translation," in Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii, October 2008.
- Semantics for Statistical Machine Translation (Pasdziernik; Supervisor: Minwei Feng)
References:
- Baker, Dorr, Bloodgood, Callison-Burch, Filardo, Piatko, Levin, and Miller: "Use of Modality and Negation in Semantically-Informed Syntactic MT," Computational Linguistics, 2012.
- Gao and Vogel: "Utilizing target-side semantic role labels to assist hierarchical phrase-based machine translation," in Proceedings of Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-5), 2011.
- Gao and Stephan Vogel: "Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules," in Proc. of the ACL, 2011.
Guidelines for the article and presentation
The roughly 20-page article together with the slides (between 20 &
30) for the presentation should be prepared in LaTeX format.
Presentations will consist of 45 minutes presentation time & 15
minutes discussion time. Document templates for both the article and
the presentation slides are provided below along with links to LaTeX
documentation available online. The article and
the slides should be prepared in LaTeX format and submitted
electronically in pdf format. Other formats will not be accepted.
- Online LaTeX-Documentation:
- Guidelines for articles and presentation slides:
General:
- The aim of the seminar for the participants is to learn the
following:
- to tackle a topic and to expand knowledge
- to critically analyze the literature
- to hold a presentation
- Take notice of references
to other topics in the seminar and discuss topics with one
another!
- Take care to stay within your
own topic. To this end participants should be aware of the other
topics in the seminar. If applicable, cross-reference
other articles and presentations.
Specific:
- Important: As part of the introduction, a slide should
outline the most important literature used for the presentation. In
addition, the presentation should clearly indicate which literature the particular
elements of the presentation refer to.
- Take notice of references
to other topics in the seminar and discuss topics with one
another!
- Participants are expected to seek out additional literature on their
topic. Assistance with the literature search is available at the
facultys library. Access to literature is naturally also available at
the Lehrstuhl für Informatik 6 library.
- Notation/Mathematical
Formulas: consistent, correct notation
is essential. When necessary, differing notation from various
literature sources is to be modified or standardized in order to be
clear and consistent. The
lectures held by the Lehrstuhl für Informatik 6 should provide a
guide as to what appropriate notation should look like.
- Tables
must have titles (appearing above the table).
- Figures
must have captions (appearing below the figure).
- In the case that no adequate translation of an
English technical term is available, the term should be used unchanged.
- Articles and presentation slides can also be prepared in
English.
- Completeness:
acknowledge all literature and
sources.
- Referencing must conform to the standard
described in the article template.
- Examples should be used to illustrate points.
- Examples should be as complex as necessary but as simple
as possible.
- Slides should be used
as presentation aids and not to replace the role of the presenter;
specifically, slides should:
- illustrate important points and relationships;
- remind the audience (and the presenter) of important aspects
and considerations;
- give the audience an overview
of the presentation.
- Slides should not contain chunks of text or complicated
sentences; rather they should consist of succinct words and terms.
- Use illustrations
where appropriate - a picture says a thousand words!
- Abbreviations should be defined at the first usage in the manner
demonstrated in the following example: "[...] at the
Rheinisch-Westfälischen Technischen Hochschule (RWTH) there are
[...]".
- Take care to stay within your
own topic. To this end participants should be aware of the other topics in the
seminar. If applicable, cross-reference
other articles and presentations.
- Usage of fonts, typefaces and colors in presentation slides must
be consistent and appropriate. Such means should serve to clarify
points or relationships, not be applied needlessly or at random.
- Care should be taken when selecting fonts for presentation
slides (also within diagrams) to ensure legibility on a projector even
for those seated far from the screen.
Contact
Inquiries should be directed to the respective supervisors or to:
Matthias Huck
RWTH Aachen
Lehrstuhl für Informatik 6
Ahornstr. 55
52056 Aachen
Room 6126 (1. Etage E2)
Telephone: 0241 / 80 21 617
E-Mail: huck [-at-] cs.rwth-aachen.de