Statistical Methods in Natural Language Processing
Automatic methods for natural language processing play an
important role in any human-machine interaction applications
and other tasks in artificial intelligence.
This course deals with statistical methods that have been
found most successful for many tasks in natural language
processing.
Contents
- Text and document classification including information retrieval
- Information extraction including tagging and semantic annotation
- Syntactic analysis and parsing
- Language modeling
- Machine translation of natural language
Lecture Notes (Access only permitted within the RWTH domain)
- WS04/05
Announcements
- 7.12.05: There were some inconsistencies with the last
exercise of sheet 7, so an updated version with a new exercise
has been uploaded. You can find it here.
- 6.12.05: There was some minor errors in the provided implementation
for exercise 5. The file has been updated.
- 29.11.05: Tomorrow (Wednesday, November 30th) the lecture will be
cancelled.
- 8.11.05: For those of you who do not have Linux installed and
do not want to install it (for whatever reason), you can have
a look at cygwin, a GNU environment for Windows, or at
Knoppix, a
Linux distribution that runs from CD without installing on the
hard disk. If you are searching for a visual programming
environment, you can try KDevelop (also included in Knoppix) or Anjuta. Note that compilation (and
execution) under Linux is a requirement for programming
exercises.
- 2.11.05: An example solution of the programming exercise of
the first exercise assigment can be found here. It's
main goal is to serve as an example of the
format you should send your submissions.
- 31.10.05: After several attempts, no time distribution
suitable for everyone could be found, so we will stick to the
initial timetable (see above).
- 25.10.05: Today in the exercise lesson we did not have time to
complete the "0. Exercise Sheet". Therefore the 4th exercise
of this sheet can be handed in for some extra points.
Exercises
- All exercises in one file
- Single exercises:
- 0. Exercise Sheet
- 1. Exercise Sheet (Submission: November, 2nd 2005)
Additional data: alice.txt
Example solution for the programming exercise (.tgz).
- 2. Exercise Sheet (Submission: November, 8th 2005)
- 3. Exercise Sheet (Submission: November, 15th 2005)
Additional data: 20 newsgroups dataset
- 4. Exercise Sheet (Submission: November, 22nd 2005)
- 5. Exercise Sheet (Submission: November, 29th 2005)
Example solution for the programming exercise (.tgz).
Trigram finite state automaton: and source for producing it (using the graphviz tools)
- 6. Exercise Sheet (Submission: December, 6th 2005)
Additional data: Tokenized alice.txt and lglass.txt
texts (original lglass.txt here).
- 7. Exercise Sheet (Submission: December, 13th 2005)
The improved list organization for
language modelling can be found in this
slides, from page 130 to page 136.
- 8. Exercise Sheet (Submission: December, 20th 2005)
- 9. Exercise Sheet (Submission: January, 10th 2006)
- 10. Exercise Sheet (Submission: January, 17th 2006)
Additional data: Wall Street Journal Corpus
- 11. Exercise Sheet (Submission: January, 24th 2006)
- 12. Exercise Sheet (Submission: January, 31st 2006)
- 13. Exercise Sheet (Submission: February, 7th 2006)
- 14. Exercise Sheet
Additional data: Feldmann Corpus
Last modified: Tue Feb 7 19:21:27 CET 2006
|
|