RWTH-OCR - Arabic Handwriting Recognition |
The RWTH OCR system is based on the open-source speech recognition framework RWTH-ASR - The RWTH Aachen University Speech Recognition System, which has been extended by video and image processing methods.
Soon:
- RWTH OCR software
- The large-vocabulary RWTH Arabic Machine-Print Newspaper (RAMP-N) corpus
Arabic handwriting recognition -- Due to Parts of Arabic Words (PAWs), white space models and low loop transitions are important in Arabic handwriting recognition.
The visualization shows a training alignment of an Arabic word to its corresponding HMM states, trained with an HMM based system. We use R-G-B background colors for the 0-1-2 HMM states, respectively, from right-to-left. The position-dependent character model names are written in the upper line, where the white-space models are annotated by 'si' for 'silence'; the state numbers are written in the bottom line. Thus, HMM state-loops and state-transitions are represented by no-color-changes and color-changes, respectively.
Published paper related to this work:
-
P. Dreuw, D. Rybach, G. Heigold, and H. Ney. RWTH OCR: A Large Vocabulary Optical Character Recognition System for Arabic Scripts. In Volker Märgner, and Haikal El Abed: Guide to OCR for Arabic Scripts Chp. Part II: Recognition, pages 215-254, Springer, London, UK, July 2012.
ISBN 978-1-4471-4071-9.
© -
P. Dreuw, and H. Ney. The RWTH-OCR Handwriting Recognition System for Arabic Handwriting. DAAD Workshop III - On the Way to the Information Society, Sousse, Tunisia, March 2010.
Invited Talk.
-
P. Dreuw, G. Heigold, and H. Ney. Confidence-Based Discriminative Training for Model Adaptation in Offline Arabic Handwriting Recognition. In International Conference on Document Analysis and Recognition (ICDAR), pages 596-600, Barcelona, Spain, July 2009.
© -
P. Dreuw, D. Rybach, C. Gollan, and H. Ney. Writer Adaptive Training and Writing Variant Model Refinement for Offline Arabic Handwriting Recognition. In International Conference on Document Analysis and Recognition (ICDAR), pages 21-25, Barcelona, Spain, July 2009.
© -
P. Dreuw, S. Jonas, and H. Ney. White-Space Models for Offline Arabic Handwriting Recognition. In International Conference on Pattern Recognition (ICPR), pages 1-4, Tampa, FL, USA, December 2008.
Some interesting links:
- Databases:
- Arabic, "A New Comprehensive Database of Hadritten Arabic Words, Numbers, and
Signatures used for OCR Testing", by Nawwaf Kharma. Maher Ahmed, and Rabab
Ward, 1999,
http://users.encs.concordia.ca/~kharma/ExchangeWeb/Databases/ArabicDBases/ - Arabic, "IFN/ENIT-Database of Handwritten Arabic Words", by M. Pechwitz, S. Snoussi Maddouri, V. Märgner, N. Ellouze , and H. Amiri, 2002
http://www.ifnenit.com - Arabic, "Data-Base for Arabic Handwritten Text Recognition Research", by S. Al-Ma'adeed, D Elliman, and C Higgins, 2004.
http://www.cs.nott.ac.uk/~cah/Databases.htm - Farsi, "Isolated Farsi/Arabic Handwritten Character DataBase (IFHCDB)",
http://ele.aut.ac.ir/imageproc/downloads/IFHCDB.htm
- Arabic, "A New Comprehensive Database of Hadritten Arabic Words, Numbers, and
Signatures used for OCR Testing", by Nawwaf Kharma. Maher Ahmed, and Rabab
Ward, 1999,
- Writing:
- Arabic Unicode block: http://en.wikipedia.org/wiki/Arabic_Unicode_block
- Ligature UTF-8 problems: http://homepage2.nifty.com/PAF00305/lib/arabic-lig-alpha.html
- Arabic script: http://www.omniglot.com/writing/arabic.htm
- Nastaliq symbols: http://en.wikipedia.org/wiki/Nastaliq
- History of Arabic Type Evolution from the 1930s till present, Blog by Pascal Zoghbi, May 2007
- Tools:
- Arabic Newspapers:
Philippe Dreuw Last modified: Mon Dec 27 14:11:14 CET 2010 Disclaimer. Created Tue Sep 22 18:04:32 CET 2007