Hauptstudium  

Statistical Methods in Natural Language Processing

Automatic methods for natural language processing play an important role in any human-machine interaction applications and other tasks in artificial intelligence. This course deals with statistical methods that have been found most successful for many tasks in natural language processing.

Type
Dates/Rooms Start Instructor
V4 Mo 10:00 - 11:30 AH VI    Prof. Dr.-Ing. H. Ney
  Mi    9:30 -  11:00 AH V  19.10.05  
Ü2 Di  16:00 - 17:30  5056   25.10.05 Prof. Dr.-Ing. H. NeyD. Vilar

Contents

  • Text and document classification including information retrieval
  • Information extraction including tagging and semantic annotation
  • Syntactic analysis and parsing
  • Language modeling
  • Machine translation of natural language

Lecture Notes (Access only permitted within the RWTH domain)

  • WS04/05

Announcements

  • 7.12.05: There were some inconsistencies with the last exercise of sheet 7, so an updated version with a new exercise has been uploaded. You can find it here.
  • 6.12.05: There was some minor errors in the provided implementation for exercise 5. The file has been updated.
  • 29.11.05: Tomorrow (Wednesday, November 30th) the lecture will be cancelled.
  • 8.11.05: For those of you who do not have Linux installed and do not want to install it (for whatever reason), you can have a look at cygwin, a GNU environment for Windows, or at Knoppix, a Linux distribution that runs from CD without installing on the hard disk. If you are searching for a visual programming environment, you can try KDevelop (also included in Knoppix) or Anjuta. Note that compilation (and execution) under Linux is a requirement for programming exercises.
  • 2.11.05: An example solution of the programming exercise of the first exercise assigment can be found here. It's main goal is to serve as an example of the format you should send your submissions.
  • 31.10.05: After several attempts, no time distribution suitable for everyone could be found, so we will stick to the initial timetable (see above).
  • 25.10.05: Today in the exercise lesson we did not have time to complete the "0. Exercise Sheet". Therefore the 4th exercise of this sheet can be handed in for some extra points.

Exercises

  • All exercises in one file
  • Single exercises:
    • 0. Exercise Sheet
    • 1. Exercise Sheet (Submission: November, 2nd 2005)
          Additional data: alice.txt
          Example solution for the programming exercise (.tgz).
    • 2. Exercise Sheet (Submission: November, 8th 2005)
    • 3. Exercise Sheet (Submission: November, 15th 2005)
          Additional data: 20 newsgroups dataset
    • 4. Exercise Sheet (Submission: November, 22nd 2005)
    • 5. Exercise Sheet (Submission: November, 29th 2005)
          Example solution for the programming exercise (.tgz).
          Trigram finite state automaton: and source for producing it (using the graphviz tools)
    • 6. Exercise Sheet (Submission: December, 6th 2005)
          Additional data: Tokenized alice.txt and lglass.txt texts (original lglass.txt here).
    • 7. Exercise Sheet (Submission: December, 13th 2005)
          The improved list organization for language modelling can be found in this slides, from page 130 to page 136.
    • 8. Exercise Sheet (Submission: December, 20th 2005)
    • 9. Exercise Sheet (Submission: January, 10th 2006)
    • 10. Exercise Sheet (Submission: January, 17th 2006)
          Additional data: Wall Street Journal Corpus
    • 11. Exercise Sheet (Submission: January, 24th 2006)
    • 12. Exercise Sheet (Submission: January, 31st 2006)
    • 13. Exercise Sheet (Submission: February, 7th 2006)
    • 14. Exercise Sheet
          Additional data: Feldmann Corpus
Last modified: Tue Feb 7 19:21:27 CET 2006