Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)

Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017) Yuen-Hsien Tseng Hsin-Hsi Chen Lung-Hao Lee Liang-Chih Yu December 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing http://www.aclweb.org/anthology/W17-59 book NLPTEA:2017 NTUCLE: Developing a Corpus of Learner English to Provide Writing Support for Engineering Students Roger Vivek PlacidusWinder JosephMacKinnon Shu YunLi Benedict Christopher Tzer LiangLin Carmel Lee HahHeah LuísMorgado da Costa TakayukiKuribayashi FrancisBond Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017) December 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 1–11 http://www.aclweb.org/anthology/W17-5901 This paper describes the creation of a new annotated learner corpus. The aim is to use this corpus to develop an automated system for corrective feedback on students’ writing. With this system, students will be able to receive timely feedback on language errors before they submit their assignments for grading. A corpus of assignments submitted by first year engineering students was compiled, and a new error tag set for the NTU Corpus of Learner English (NTUCLE) was developed based on that of the NUS Corpus of Learner English (NUCLE), as well as marking rubrics used at NTU. After a description of the corpus, error tag set and annotation process, the paper presents the results of the annotation exercise as well as follow up actions. The final error tag set, which is significantly larger than that for the NUCLE error categories, is then presented before a brief conclusion summarising our experience and future plans. inproceedings winder-EtAl:2017:NLPTEA Understanding Non-Native Writings: Can a Parser Help? JirkaHana BarboraHladka Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017) December 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 12–16 http://www.aclweb.org/anthology/W17-5902 We present a pilot study on parsing non-native texts written by learners of Czech. We performed experiments that have shown that at least high-level syntactic functions, like subject, predicate, and object, can be assigned based on a parser trained on standard native language. inproceedings hana-hladka:2017:NLPTEA Carrier Sentence Selection for Fill-in-the-blank Items ShuJiang JohnLee Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017) December 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 17–22 http://www.aclweb.org/anthology/W17-5903 Fill-in-the-blank items are a common form of exercise in computer-assisted language learning systems. To automatically generate an effective item, the system must be able to select a high-quality carrier sentence that illustrates the usage of the target word. Previous approaches for carrier sentence selection have considered sentence length, vocabulary difficulty, the position of the target word and the presence of finite verbs. This paper investigates the utility of word co-occurrence statistics and lexical similarity as selection criteria. In an evaluation on generating fill-in-the-blank items for learning Chinese as a foreign language, we show that these two criteria can improve carrier sentence quality. inproceedings jiang-lee:2017:NLPTEA Hindi Shabdamitra: A Wordnet based E-Learning Tool for Language Learning and Teaching HanumantRedkar SandhyaSingh MeenakshiSomasundaram DharaGorasia MalharKulkarni PushpakBhattacharyya Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017) December 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 23–28 http://www.aclweb.org/anthology/W17-5904 In today's technology driven digital era, education domain is undergoing a transformation from traditional approaches to more learner controlled and flexible methods of learning. This transformation has opened the new avenues for interdisciplinary research in the field of educational technology and natural language processing in developing quality digital aids for learning and teaching. The tool presented here - Hindi Shabhadamitra, developed using Hindi Wordnet for Hindi language learning, is one such e-learning tool. It has been developed as a teaching and learning aid suitable for formal school based curriculum and informal setup for self learning users. Besides vocabulary, it also provides word based grammar along with images and pronunciation for better learning and retention. This aid demonstrates that how a rich lexical resource like wordnet can be systematically remodeled for practical usage in the educational domain. inproceedings redkar-EtAl:2017:NLPTEA NLPTEA 2017 Shared Task – Chinese Spelling Check GabrielFung MaximeDebosschere DingminWang BoLi JiaZhu Kam-FaiWong Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017) December 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 29–34 http://www.aclweb.org/anthology/W17-5905 This paper provides an overview along with our findings of the Chinese Spelling Check shared task at NLPTEA 2017. The goal of this task is to develop a computer-assisted system to automatically diagnose typing errors in traditional Chinese sentences written by students. We defined six types of errors which belong to two categories. Given a sentence, the system should detect where the errors are, and for each detected error determine its type and provide correction suggestions. We designed, constructed, and released a benchmark dataset for this task. inproceedings fung-EtAl:2017:NLPTEA Chinese Spelling Check based on N-gram and String Matching Algorithm Jui-FengYeh Li-TingChang Chan-YiLiu Tsung-WeiHsu Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017) December 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 35–38 http://www.aclweb.org/anthology/W17-5906 This paper presents a Chinese spelling check approach based on language models combined with string match algorithm to treat the problems resulted from the influence caused by Cantonese mother tone. N-grams first used to detecting the probability of sentence constructed by the writers, a string matching algorithm called Knuth-Morris-Pratt (KMP) Algorithm is used to detect and correct the error. According to the experimental results, the proposed approach can detect the error and provide the corresponding correction. inproceedings yeh-EtAl:2017:NLPTEA N-gram Model for Chinese Grammatical Error Diagnosis JianboZhao HaoLiu ZuyiBao XiaopengBai SiLi ZhiqingLin Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017) December 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 39–44 http://www.aclweb.org/anthology/W17-5907 Detection and correction of Chinese grammatical errors have been two of major challenges for Chinese automatic grammatical error diagnosis.This paper presents an N-gram model for automatic detection and correction of Chinese grammatical errors in NLPTEA 2017 task. The experiment results show that the proposed method is good at correction of Chinese grammatical errors. inproceedings zhao-EtAl:2017:NLPTEA The Influence of Spelling Errors on Content Scoring Performance AndreaHorbach YuningDing TorstenZesch Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017) December 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 45–53 http://www.aclweb.org/anthology/W17-5908 Spelling errors occur frequently in educational settings, but their influence on automatic scoring is largely unknown. We therefore investigate the influence of spelling errors on content scoring performance using the example of the ASAP corpus. We conduct an annotation study on the nature of spelling errors in the ASAP dataset and utilize these finding in machine learning experiments that measure the influence of spelling errors on automatic content scoring. Our main finding is that scoring methods using both token and character n-gram features are robust against spelling errors up to the error frequency in ASAP. inproceedings horbach-ding-zesch:2017:NLPTEA Analyzing the Impact of Spelling Errors on POS-Tagging and Chunking in Learner English TomoyaMizumoto RyoNagata Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017) December 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 54–58 http://www.aclweb.org/anthology/W17-5909 Part-of-speech (POS) tagging and chunking have been used in tasks targeting learner English; however, to the best our knowledge, few studies have evaluated their performance and no studies have revealed the causes of POS-tagging/chunking errors in detail. Therefore, we investigate performance and analyze the causes of failure. We focus on spelling errors that occur frequently in learner English. We demonstrate that spelling errors reduced POS-tagging performance by 0.23% owing to spelling errors, and that a spell checker is not necessary for POS-tagging/chunking of learner English. inproceedings mizumoto-nagata:2017:NLPTEA Complex Word Identification: Challenges in Data Annotation and System Performance MarcosZampieri ShervinMalmasi GustavoPaetzold LuciaSpecia Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017) December 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 59–63 http://www.aclweb.org/anthology/W17-5910 This paper revisits the problem of complex word identification (CWI) following up the SemEval CWI shared task. We use ensemble classifiers to investigate how well computational methods can discriminate between complex and non-complex words. Furthermore, we analyze the classification performance to understand what makes lexical complexity challenging. Our findings show that most systems performed poorly on the SemEval CWI dataset, and one of the reasons for that is the way in which human annotation was performed. inproceedings zampieri-EtAl:2017:NLPTEA Suggesting Sentences for ESL using Kernel Embeddings KentShioda MamoruKomachi RueIkeya DaichiMochihashi Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017) December 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 64–68 http://www.aclweb.org/anthology/W17-5911 Sentence retrieval is an important NLP application for English as a Second Language (ESL) learners. ESL learners are familiar with web search engines, but generic web search results may not be adequate for composing documents in a specific domain. However, if we build our own search system specialized to a domain, it may be subject to the data sparseness problem. Recently proposed word2vec partially addresses the data sparseness problem, but fails to extract sentences relevant to queries owing to the modeling of the latent intent of the query. Thus, we propose a method of retrieving example sentences using kernel embeddings and N-gram windows. This method implicitly models latent intent of query and sentences, and alleviates the problem of noisy alignment. Our results show that our method achieved higher precision in sentence retrieval for ESL in the domain of a university press release corpus, as compared to a previous unsupervised method used for a semantic textual similarity task. inproceedings shioda-EtAl:2017:NLPTEA Event Timeline Generation from History Textbooks HarsimranBedi SangameshwarPatil SwapnilHingmire GirishPalshikar Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017) December 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 69–77 http://www.aclweb.org/anthology/W17-5912 Event timeline serves as the basic structure of history, and it is used as a disposition of key phenomena in studying history as a subject in secondary school. In order to enable a student to understand a historical phenomenon as a series of connected events, we present a system for automatic event timeline generation from history textbooks. Additionally, we propose Message Sequence Chart (MSC) and time-map based visualization techniques to visualize an event timeline. We also identify key computational challenges in developing natural language processing based applications for history textbooks. inproceedings bedi-EtAl:2017:NLPTEA