Seminar "Selected Topics in Human Language Technology and Pattern Recognition"
In the Summer Term 2018 the Lehrstuhl Informatik 6 will host a
seminar entitled "Selected Topics in Human Language Technology and Pattern
Recognition".
Registration for the seminar
Registration
for the seminar is only possible online via
the central
registration page
from Friday, Jan. 19 to Friday, Feb. 02, 2018. A link can
also be found on the Computer Science Department's homepage.
Prerequisites for participation in the seminar
- Bachelor students: Einführung in das wissenschaftliche Arbeiten (Proseminar)
- Master students: Bachelor degree
- Attendance of the lectures Pattern Recognition and Neural
Networks, Speech Recognition or Statistical Methods in Natural Language
Processing, or evidence of equivalent knowledge is highly recommended.
- For successful participants of the above lectures, seminar participation is guaranteed.
Seminar format and important dates
Please note the following deadlines:
- Proposals: initial proposals will be accepted up
until the start of the term's
lecture period (April 09, 2018) by email to the
seminar topic's supervisor. At this time, participants must
arrange an appointment with the relevant supervisor. Revised
proposals will be accepted up until two weeks after the start of the term.
- Article: PDF must be submitted at least
1 month prior to the trial
presentation date by email to the seminar topic's
supervisor.
- Presentation slides: PDF must be submitted at
least 1 week prior to the trial
presentation date by email to the seminar topic's
supervisor.
supervisor.
- Trial presentations: at least 2 weeks prior to the
actual presentation date; refer to the topics section.
- Seminar presentations: date will be announced during lecture period.
- Final (possibly corrected) articles and presentation slides:
PDF must be submitted at the latest 4
weeks after the presentation date by email to the seminar topic's supervisor.
- Compulsory attendance: in order to pass, participants must attend all presentation sessions.
- Ethical Guidelines:The Computer Science Department
of RWTH Aachen University has
adopted ethical
guidelines for the authoring of academic work, such as seminar
reports. Each student has to comply with these guidelines. In this
regard, you, as a seminar attendant, have to sign
a declaration of
compliance, in which you assert that your work complies with
the guidelines, that all references used are properly cited, and
that the report was done autonomously by yourself. We ask you do
download the
guidelines
and submit
the declaration
together with your seminar report and talk to your supervisor.
You also find
a German
version of the guidelines and
a German version of the
declaration you may use as well.
Note: failure to meet deadlines, absence without permission from
compulsory sessions (presentations and preliminary meeting as
announced by email to each participating student), or dropping out
of the seminar after more than 3 weeks after the preliminary
meeting/topic distribution results in the grade
5.0/not appeared.
Topics, relevant references and participants
- Speaker Diarization
-
Methods
(Engelke; Supervisor: Wilfried Michel)
Presentation Date: Week of 04.06. to 08.06.
Initial References:
- M.H. Moattar and M.M. Homayounpour, "A review on speaker diarization systems and approaches", Speech Communication, Volume 54, Issue 10, 2012, doi.org/10.1016/j.specom.2012.05.002
- Q. Wang, C. Downey, L. Wan, P.A. Mansfield and I.L. Moreno, "Speaker Diarization with LSTM", arXiv:1710.10468 [eess.AS] 2018
-
Applications and Challenges
(Thull; Supervisor: Wilfried Michel)
Presentation Date: 19.06
Initial References:
- T. L. Nwe, H. Sun, H. Li and S. Rahardja, "Speaker diarization in meeting audio," 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Taipei, 2009 doi: 10.1109/ICASSP.2009.4960523
- K. Church, W. Zhu, J. Vopicka, J. Pelecanos, D. Dimitriadis and P. Fousek, "Speaker diarization: A perspective on challenges and opportunities from theory to practice," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, 2017 doi: 10.1109/ICASSP.2017.7953098
- Speaker Separation
-
Deep Clustering
(Von Platen; Supervisor: Tobias Menne)
Presentation Date: 19.06
Initial References:
- John R. Hershey, Zhuo Chen, Jonathan Le Roux, Shinji Watanabee: "Deep Clustering: Discriminative Embeddings for Segmentation and Separation," IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, March 20-25, 2016.
- Zhuo Chen, Yi Luo, Nima Mesgarani: "Deep Attractor Network for Single-Microhpone Speaker Separation", IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, Mar 246-250, 2017
-
Permutation invariant training
Initial References:
- Dong Yu, Morten Kolbaek, Zheng-Hua Tan, Jesper Jensen: "Permutation Invariant Training of Deep Models for Speaker-Independent Multi-Talker Speech Separation", IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, Mar 246-250, 2017
- Yanmin Qian, Xuankai Chang, Dong Yu: "Single-Channel Multi-talker Speech Recognition with permutation Invariant Training", submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing, arXiv:1707.06527
- Speaker Identification
-
Speaker Recognition
(Tang; Supervisor: Weiyue Wang)
Presentation Date: 19.06
Initial References:
- A Novel Scheme for Speaker Recognition Using a Phonetically-Aware Deep Neural Network, ICASSP 2014, http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6853887
- Deep Neural Network Approaches to Speaker and Language Recognition, IEEE Signal Processing Letters 2015, http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7080838
-
Named Entity Recognition
(Petri; Supervisor: Weiyue Wang)
Presentation Date: 21.06
Initial References:
- Neural Architectures for Named Entity Recognition, NAACL 2016, http://www.aclweb.org/anthology/N16-1030
- End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF, ACL 2016, http://www.aclweb.org/anthology/P16-1101
- Voice Activity Detection
-
Challenges
Initial References:
- Damianos Karakos, Scott Novotney, Le Zhang, Rich Schwartz, "Model Adaptation and Active Learning in the BBN Speech Activity Detection System for the DARPA RATS program", Interspeech 2016. http://dx.doi.org/10.21437/Interspeech.2016-603
- Tomi Kinnunen, Alexey Sholokhov, Elie Khoury, Dennis Thomsen, Md Sahidullah, Zheng-Hua Tan, "HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors", Interspeech 2016. http://dx.doi.org/10.21437/Interspeech.2016-1281
-
Google Home
Initial References:
- abio Vesperini, Paolo Vecchiotti, Emanuele Principi, Stefano Squartini, and Francesco Piazza, "Deep Neural Networks for Multi-Room Voice Activity Detection: Advancements and Comparative Evaluation", IJCNN 2016. https://doi.org/10.1109/IJCNN.2016.7727633
- Shuo-Yiin Chang, Bo Li, Tara N. Sainath, Gabor Simko, Carolina Parada, "Endpoint Detection using Grid Long Short-Term Memory Networks for Streaming Speech Recognition", Interspeech 2017. http://dx.doi.org/10.21437/Interspeech.2017-284
-
Feature Selection
(Bebawi; Supervisor: Christoph Lüscher)
Presentation Date: 21.06
Initial References:
- Ruben Zazo, Tara N. Sainath, Gabor Simko, Carolina Parada, "Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection", Interspeech 2016. http://dx.doi.org/10.21437/Interspeech.2016-268
- Elie Khoury, Matt Garland, "I-Vectors for Speech Activity Detection", Odyssey 2016. http://www.odyssey2016.org/papers/pdfs_stamped/79.pdf
- Longbiao Wang, Khomdet Phapatanaburi, Zeyan Oo, Seiichi Nakagawa, Masahiro Iwahashi, Jianwu Dang, "PHASE AWARE DEEP NEURAL NETWORK FOR NOISE ROBUST VOICE ACTIVITY DETECTION", ICME 2017. https://doi.org/10.1109/ICME.2017.8019414
-
Audio-Visual Combination
Initial References:
- David Dov, Ronen Talmon and Israel Cohen, "Kernel Method for Speech Source Activity Detection in Multi-modal Signals", ICSEE 2016. https://doi.org/10.1109/ICSEE.2016.7806062
- Ido Ariav, David Dov, Israel Cohen, "A deep architecture for audio-visual voice activity detection in the presence of transients", Signal Processing 142 (2018) p. 69-74. http://dx.doi.org/10.1016/j.sigpro.2017.07.006
- Foteini Patrona, Alexandros Iosifidis, Anastasios Tefas, Nikolaos Nikolaidis and Ioannis Pitas, "Visual Voice Activity Detection in the Wild", IEEE TRANSACTIONS ON MULTIMEDIA, Vol. 18, No. 6, June 2016. https://doi.org/10.1109/TMM.2016.2535357
- Word Embeddings and Natural Language Understanding
-
Word embeddings and their applications to natural language processing
(Gerstenberger; Supervisor: Kazuki Irie)
Presentation Date: 21.06
Initial References:
- J. Pennington, R. Socher, and C. D. Manning. "GloVe: Global Vectors for Word Representation," in Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), pages 1532-1543, Doha, Qatar, October 2014. https://www.aclweb.org/anthology/D14-1162
- M. Kusner, Y. Sun, N. Kolkin, and K. Weinberger, "From Word Embeddings To Document Distances," in Proc. Int. Conf. on Machine Learning (ICML), pages 957-966, Lille, France, July 2015. http://proceedings.mlr.press/v37/kusnerb15.pdf
-
Neural network based natural language understanding
Initial References:
- [Intent classification] S. Ravuri, and A. Stolcke, "Recurrent Neural Network and LSTM Models for Lexical Utterance Classification," in Proc. Interspeech, pages 135-139, Dresden, Germany, September 2015. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/RNNLM_addressee.pdf
- [Slot filling] G. Mesnil, Y. Dauphin, K. Yao, Y. Bengio, L. Deng, D. Hakkani-Tur, X. He, L. Heck, G. Tur, D. Yu, and G. Zweig, "Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding," IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 23, No. 3, March 2015, pages 530-539. https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44628.pdf
- Sentiment Analysis from Audio
-
Emotion detection
(Hanbing; Supervisor: Eugen Beck)
Presentation Date: 22.06
Initial References:
- G. Trigeorgis et al., "Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network," 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, 2016, pp. 5200-5204.
- Ghosh, S., Laksana, E., Morency, L.P. and Scherer, S., 2016, September. Representation Learning for Speech Emotion Recognition. In INTERSPEECH (pp. 3603-3607).
-
Multimodal sentiment analysis
(Meng; Supervisor: Eugen Beck)
Presentation Date: 22.06
Initial References:
- Poria, S., Cambria, E. and Gelbukh, A., 2015. Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 2539-2544).
- Mohammad Soleymani, David Garcia, Brendan Jou, Björn Schuller, Shih-Fu Chang, Maja Pantic, A survey of multimodal sentiment analysis, Image and Vision Computing, Volume 65, 2017, Pages 3-14,
- Language Identification (1)
-
Language Identification
(Gharbi; Supervisor: Markus Kitza)
Presentation Date: 22.06
Initial References:
- Reviewing automatic language identification: http://ieeexplore.ieee.org/abstract/document/317925/
- A covariance kernel for svm language recognition: http://ieeexplore.ieee.org/abstract/document/4518566/ (2008)
- The MITLL NIST LRE 2009 language recognition system: http://ieeexplore.ieee.org/abstract/document/5495080/ (2009)
- Speech Synthesis
-
Auto-regressive models
(Gajjar; Supervisor: Albert Zeyer)
Presentation Date: 25.06
Initial References:
- Efficient Neural Audio Synthesis, https://arxiv.org/abs/1802.08435
- PixelCNN++, https://arxiv.org/abs/1701.05517
-
Inverse autoregressive flows
Initial References:
- Parallel WaveNet: Fast High-Fidelity Speech Synthesis, https://arxiv.org/abs/1711.10433
- Improving Variational Inference with Inverse Autoregressive Flow, https://arxiv.org/abs/1606.04934
-
End-to-end text-to-speech
Initial References:
- VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop, https://arxiv.org/abs/1707.06588
- Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions, https://arxiv.org/abs/1712.05884
- Speech Enhancement
-
Speech Enhancement for Human Listeners
Initial References:
- X. Xu, R. Flynn, and M. Russell, "Speech intelligibility and quality: A comparative study of speech enhancement algorithms," in 2017 28th Irish Signals and Systems Conference (ISSC), 2017, pp. 1-6. http://ieeexplore.ieee.org/document/7983599/
- P. C. Loizou and G. Kim, "Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions," IEEE Trans. Audio. Speech. Lang. Processing, vol. 19, no. 1, pp. 47-56, Jan. 2011. http://ieeexplore.ieee.org/document/5428850/
- Y. Xu, J. Du, L. Dai, and C. Lee, "A Regression Approach to Speech Enhancement Based on Deep Neural Networks," IEEE Trans. Audio, Speech Lang. Process., vol. 23, no. 1, pp. 7-19, 2015. http://ieeexplore.ieee.org/document/6932438/
-
Speech Enhancement for ASR
Initial References:
- F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Roux, J. R. Hershey, and B. Schuller, "Speech Enhancement with LSTM Recurrent Neural Networks and Its Application to Noise-Robust ASR," in Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation - Volume 9237, 2015, pp. 91-99. https://hal.inria.fr/hal-01163493/file/weninger_LVA15.pdf
- T. Ochiai, S. Watanabe, and S. Katagiri, "Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR," in 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), 2017, no. 26280063, pp. 1-6. http://ieeexplore.ieee.org/document/8168188/
- Constituency Parsing
-
Neural Network-based Parsing
(Shanmuga Sundaram; Supervisor: Parnia Bahar)
Presentation Date: Week of 18.06. to 22.06.
Initial References:
- Danqi Chen and Christopher D. Manning, "A Fast and Accurate Dependency Parser using Neural Networks", Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, http://aclweb.org/anthology/D/D14/D14-1082.pdf
- Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews and Noah A. Smith, "Transition-Based Dependency Parsing with Stack Long Short-Term Memory", Proceedings of the Association for Computational Linguistics, ACL 2015, http://aclweb.org/anthology/P/P15/P15-1033.pdf
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, " Attention is all you need", Advances in Neural Information Processing Systems, NIPS 2017, https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
-
Universal Semantic Parsing
Initial References:
- Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer and Noah A. Smith, "Many Languages, One Parser", Transaction of the Association for Computational Linguistics, TACL 2016, https://pdfs.semanticscholar.org/7365/737ad34c9badcfba8fcf15aba669158237d8.pdf
- Raymond Hendy Susanto and Wei Lu, "Neural Architectures for Multilingual Semantic Parsing", Proceedings of the Association for Computational Linguistics, ACL 2017,http://aclweb.org/anthology/P17-2007
- Long Duong, Trevor Cohn, Steven Bird and Paul Cook, "A Neural Network Model for Low-Resource Universal Dependency Parsing", Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, http://aclweb.org/anthology/D/D15/D15-1040.pdf
- Text Summarization
-
Extractive Text Summarizatio
(Friedrichs; Supervisor: Jan Rosendahl)
Presentation Date: 25.06
Initial References:
- Text Summarization Techniques: A Brief Survey. Mehdi Allahyari, Seyedamin Pouriyeh et. al. 2017. https://arxiv.org/abs/1707.02268
- Automatic Text Summarization (book). Torres-Moreno, Juan-Manuel, 2014. http://onlinelibrary.wiley.com/book/10.1002/9781119004752 (RWTH Aachen Network)
-
Abstractive Text Summarization (with Deep Learning)
(Ahmed; Supervisor: Julian Schamper)
Presentation Date: 25.06
Initial References:
- Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond. Ramesh Nallapati, Bowen Zhou et. al. 2016. CoNLL. http://www.aclweb.org/anthology/K16-1028
- Get To The Point: Summarization with Pointer-Generator Networks. Abigail See, Peter J. Liu et. al. 2017. ACL. http://aclweb.org/anthology/P17-1099
- Sentiment Analysis of Text
-
Document/Sentence Level Sentiment Analysis
(Tokarchuk; Supervisor: Yunsu Kim)
Presentation Date: 26.06
Initial References:
- Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, E. Hovy, "Hierarchical Attention Networks for Document Classification", NAACL-HLT 2016, http://www.aclweb.org/anthology/N16-1174
- X. Wang, W. Jiang, Z. Luo, "Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts", COLING 2016, http://www.aclweb.org/anthology/C16-1229
-
Aspect Level Sentiment Analysis
(Petrov; Supervisor: Yunsu Kim)
Presentation Date: 26.06
Initial References:
- Y. Wang, M. Huang, L. Zhao, X. Zhu, "Attention-based LSTM for Aspect-level Sentiment Classification", EMNLP 2016, https://aclweb.org/anthology/D16-1058
- P. Chen, Z. Sun, L. Bing, W. Yang, "Recurrent Attention Network on Memory for Aspect Sentiment Analysis", EMNLP 2017, http://aclweb.org/anthology/D17-1047
- Language Identification (2)
-
Language Identification with Deep Learning
(Gokrani; Supervisor: Markus Kitza)
Presentation Date: 26.06
Initial References:
- Automatic language identification using deep neural networks: http://ieeexplore.ieee.org/abstract/document/6854622/ (2014)
- Convolutional ANN: Deep learning for spoken language identification: https://pdfs.semanticscholar.org/1b17/f0926b373ef49245a28fdddd3c9e90006e60.pdf (2009)
Guidelines for the article and presentation
The roughly 20-page article together with the slides (between 20 &
30) for the presentation should be prepared in LaTeX format.
Presentations will consist of 30 to 40 minutes presentation time & 15
minutes discussion time. Document templates for both the article and
the presentation slides are provided below along with links to LaTeX
documentation available online. The article and
the slides should be prepared in LaTeX format and submitted
electronically in pdf format. Other formats will not be accepted.
- Online LaTeX-Documentation:
- Article
Template (51kB), contains the template and all necessary
files in tar format (or here 10kB
in zip format).
- New Presentation
Slide Template, a zip file containing the template and all
necessary graphics as well as the institutes style template.
Note: We deactivated the RWTH and i6 logos in this version of the template
since the seminar content is produced by students outside of i6.
- Guidelines for articles and presentation slides:
General:
- The aim of the seminar for the participants is to learn the
following:
- to tackle a topic and to expand knowledge
- to critically analyze the literature
- to hold a presentation
- Take notice of references
to other topics in the seminar and discuss topics with one
another!
- Take care to stay within your
own topic. To this end participants should be aware of the other
topics in the seminar. If applicable, cross-reference
other articles and presentations.
Specific:
- Important: As part of the introduction, a slide should
outline the most important literature used for the presentation. In
addition, the presentation should clearly indicate which literature the particular
elements of the presentation refer to.
- Take notice of references
to other topics in the seminar and discuss topics with one
another!
- Participants are expected to seek out additional literature on their
topic. Assistance with the literature search is available at the
facultys library. Access to literature is naturally also available at
the Lehrstuhl Informatik 6 library.
- Notation/Mathematical
Formulas: consistent, correct notation
is essential. When necessary, differing notation from various
literature sources is to be modified or standardized in order to be
clear and consistent. The
lectures held by the Lehrstuhl Informatik 6 should provide a
guide as to what appropriate notation should look like.
- Tables
must have titles (appearing above the table).
- Figures
must have captions (appearing below the figure).
- The use of English is recommended and mandatory for the presentation
slides.
Nevertheless the article and oral presentation might be German.
- In the case that no adequate translation of an
English technical term is available, the term should be used unchanged.
- Completeness:
acknowledge all literature and
sources.
- Referencing must conform to the standard
described in the article template.
- Examples should be used to illustrate points.
- Examples should be as complex as necessary but as simple
as possible.
- Slides should be used
as presentation aids and not to replace the role of the presenter;
specifically, slides should:
- illustrate important points and relationships;
- remind the audience (and the presenter) of important aspects
and considerations;
- give the audience an overview
of the presentation.
- Slides should not contain chunks of text or complicated
sentences; rather they should consist of succinct words and terms.
- Use illustrations
where appropriate - a picture says a thousand words!
- Abbreviations should be defined at the first usage in the manner
demonstrated in the following example: "[...] at the
Rheinisch-Westfälischen Technischen Hochschule (RWTH) there are
[...]".
- Take care to stay within your
own topic. To this end participants should be aware of the other topics in the
seminar. If applicable, cross-reference
other articles and presentations.
- Usage of fonts, typefaces and colors in presentation slides must
be consistent and appropriate. Such means should serve to clarify
points or relationships, not be applied needlessly or at random.
- Care should be taken when selecting fonts for presentation
slides (also within diagrams) to ensure legibility on a projector even
for those seated far from the screen.
Contact
Inquiries should be directed to the respective supervisors or to:
Markus Kitza
RWTH Aachen University
Lehrstuhl Informatik 6
Ahornstr. 55
52074 Aachen
Room 6110
E-Mail: kitza@cs.rwth-aachen.de