Statistical Methods in Natural Language Processing
Automatic methods for natural language processing play an
important role in any human-machine interaction applications
and other tasks in artificial intelligence.
This course deals with statistical methods that have been
found most successful for many tasks in natural language
- Text and document classification including information retrieval
- Information extraction including tagging and semantic annotation
- Syntactic analysis and parsing
- Language modeling
- Machine translation of natural language
Lecture Notes (Access only permitted within the RWTH domain)
- Slides from WS05/06
14.7.08: The lecture on Wednesday, 16. July 2008, will take place in our seminar room,
as the room AHV is not available.
3.7.08: There will be an additional lecture on Tuesday, 8. July 2008 at 14:00 in our
seminar room, in order to compensate with the missing lecture of wednesday 2. July.
18.6.08: There was a small mistake in the 8th exercise sheet. The pdfs have been updated.
8.5.08: The extra lectures next week will be held in the room AH V.
7.5.08: A solution for exercise 3a) of exercise sheet 2 has been
published. The result is needed for the first exercise of sheet 3.
Following days we will have additional lectures:
The room is still to be announced.
- 13 May 2008 (Tu): 10:00-11:30 and 13:00-14:30
- 14 May 2008 (We): 10:00-11:30
Update: The room will be AH V.
Folowing days there will be no main lecture:
There is a proposal to have two additional lectures on Wednesday, 14
May 2008. This proposal will be discussed in the lecture on Wednesday,
7 May 2008.
- 21 May 2008 (We)
- 11 June 2008 (We, Dies)
- 23 June 2008 (Mon)
- 25 June 2008 (We)
21.4.08: We were not able to discuss exercise sheet 0 in the exercise
hour. We will discuss them next week, but you can hand them in for some
- All exercise sheets in one file
- Single exercise sheets:
- 0. Exercise Sheet
The data used in the presentation about makefiles can be downloaded here.
- 1. Exercise Sheet (Submission: 28 April 2008)
Additional data: alice.txt
Example solution for exercise 4: ex01.src.tgz
- 2. Exercise Sheet (Submission: 5 May 2008)
Solution for exercise 3a) (with some annotations)
- 3. Exercise Sheet (Submission: 19 May 2008)
Additional data: 20 Newsgroups Corpus, Spam Corpus
An example implementation of a multinomial classifier can be found here.
The confusion matrix for the 20 newsgroups data set is also available (with correct labels)
- 4. Exercise Sheet (Submission: 26 May 2008)
- 5. Exercise Sheet (Submission: 2 June 2008)
Additional data: European Parliament Corpus
Example implementation for exercise 4
- 6. Exercise Sheet (Submission: 9 June 2008)
- 7. Exercise Sheet (Submission: 16 June 2008)
Additional data: Wall Street Journal POS Corpus
Example implementation of a bigram-based POS tagger.
- 8. Exercise Sheet (Submission: 23 June 2008)
Additional data: Arcturan-Centauri parallel corpus in a format suitable for drawing alignment links
- 9. Exercise Sheet (Submission: 30 June 2008)
- 10. Exercise Sheet (Submission: 7 July 2008)
- 11. Exercise Sheet (Submission: 14 July 2008)
- 12. Exercise Sheet (Submission: Optional)
Additional data: Translation data
Tue Jul 15 11:38:21 CEST 2008