Statistical Methods in Natural Language Processing

Main Studies

Statistical Methods in Natural Language Processing

Automatic methods for natural language processing play an important role in any human-machine interaction applications and other tasks in artificial intelligence.
This course deals with statistical methods that have been found most successful for many tasks in natural language processing.


Type	Dates/Rooms	Start	Instructor
V4	Mo 10:00 - 11:30 AH VI	14.04.08	Prof. Dr.-Ing. H. Ney
	Mi 10:00 - 11:30 AH V
Ü2	Mo 14:00 - 15:30 AH III	14.04.08	Prof. Dr.-Ing. H. Ney, D. Vilar

Text and document classification including information retrieval
Information extraction including tagging and semantic annotation
Syntactic analysis and parsing
Language modeling
Machine translation of natural language

Lecture Notes (Access only permitted within the RWTH domain)

Slides from WS05/06

Announcements

14.7.08: The lecture on Wednesday, 16. July 2008, will take place in our seminar room, as the room AHV is not available.
3.7.08: There will be an additional lecture on Tuesday, 8. July 2008 at 14:00 in our seminar room, in order to compensate with the missing lecture of wednesday 2. July.
18.6.08: There was a small mistake in the 8th exercise sheet. The pdfs have been updated.
8.5.08: The extra lectures next week will be held in the room AH V.
7.5.08: A solution for exercise 3a) of exercise sheet 2 has been published. The result is needed for the first exercise of sheet 3.
Following days we will have additional lectures:
- 13 May 2008 (Tu): 10:00-11:30 and 13:00-14:30
- 14 May 2008 (We): 10:00-11:30
The room is still to be announced.
Update: The room will be AH V.
Important: Folowing days there will be no main lecture:
- 21 May 2008 (We)
- 11 June 2008 (We, Dies)
- 23 June 2008 (Mon)
- 25 June 2008 (We)
There is a proposal to have two additional lectures on Wednesday, 14 May 2008. This proposal will be discussed in the lecture on Wednesday, 7 May 2008.
21.4.08: We were not able to discuss exercise sheet 0 in the exercise hour. We will discuss them next week, but you can hand them in for some extra points

Exercises

All exercise sheets in one file
Single exercise sheets:
- 0. Exercise Sheet
  The data used in the presentation about makefiles can be downloaded here.
- 1. Exercise Sheet (Submission: 28 April 2008)
  Additional data: alice.txt
  Example solution for exercise 4: ex01.src.tgz
- 2. Exercise Sheet (Submission: 5 May 2008)
  Solution for exercise 3a) (with some annotations)
- 3. Exercise Sheet (Submission: 19 May 2008)
  Additional data: 20 Newsgroups Corpus, Spam Corpus
  An example implementation of a multinomial classifier can be found here.
  The confusion matrix for the 20 newsgroups data set is also available (with correct labels)
- 4. Exercise Sheet (Submission: 26 May 2008)
- 5. Exercise Sheet (Submission: 2 June 2008)
  Additional data: European Parliament Corpus
  Example implementation for exercise 4
- 6. Exercise Sheet (Submission: 9 June 2008)
- 7. Exercise Sheet (Submission: 16 June 2008)
  Additional data: Wall Street Journal POS Corpus
  Example implementation of a bigram-based POS tagger.
- 8. Exercise Sheet (Submission: 23 June 2008)
  Additional data: Arcturan-Centauri parallel corpus in a format suitable for drawing alignment links
- 9. Exercise Sheet (Submission: 30 June 2008)
- 10. Exercise Sheet (Submission: 7 July 2008)
- 11. Exercise Sheet (Submission: 14 July 2008)
- 12. Exercise Sheet (Submission: Optional)
  Additional data: Translation data

Last Modified Tue Jul 15 11:38:21 CEST 2008

Statistical Methods in Natural Language Processing

Contents

Lecture Notes (Access only permitted within the RWTH domain)

Announcements

Exercises