Teaching a Massive Open Online Course on Natural Language Processing
Ekaterina Artemova | Murat Apishev | Denis Kirianov | Veronica Sarkisyan | Sergey Aksenov | Oleg Serikov
Proceedings of the Fifth Workshop on Teaching NLP
In this paper we present a new Massive Open Online Course on Natural Language Processing, targeted at non-English speaking students. The course lasts 12 weeks, every week consists of lectures, practical sessions and quiz assigments. Three weeks out of 12 are followed by Kaggle-style coding assigments. Our course intents to serve multiple purposes: (i) familirize students with the core concepts and methods in NLP, such as language modelling or word or sentence representations, (ii) show that recent advances, including pre-trained Transformer-based models, are build upon these concepts; (iii) to introduce architectures for most most demanded real-life applications, (iii) to develop practical skills to process texts in multiple languages. The course was prepared and recorded during 2020 and so far have received positive feedback.
Applications such as machine translation, speech recognition, and information retrieval require efficient handling of noun compounds as they are one of the possible sources for out of vocabulary words. In-depth processing of noun compounds requires not only splitting them into smaller components (or even roots) but also the identification of instances that should remain unsplitted as they are of idiomatic nature. We develop a two-fold deep learning-based approach of noun compound splitting and idiomatic compound detection for the German language that we train using a newly collected corpus of annotated German compounds. Our neural noun compound splitter operates on a sub-word level and outperforms the current state of the art by about 5%