Proceedings of the IJCNLP 2017, System Demonstrations

Seong-Bae Park, Thepchai Supnithi (Editors)

Anthology ID:
Tapei, Taiwan
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the IJCNLP 2017, System Demonstrations
Seong-Bae Park | Thepchai Supnithi

pdf bib
MASSAlign: Alignment and Annotation of Comparable Documents
Gustavo Paetzold | Fernando Alva-Manchego | Lucia Specia

We introduce MASSAlign: a Python library for the alignment and annotation of monolingual comparable documents. MASSAlign offers easy-to-use access to state of the art algorithms for paragraph and sentence-level alignment, as well as novel algorithms for word-level annotation of transformation operations between aligned sentences. In addition, MASSAlign provides a visualization module to display and analyze the alignments and annotations performed.

pdf bib
CADET: Computer Assisted Discovery Extraction and Translation
Benjamin Van Durme | Tom Lippincott | Kevin Duh | Deana Burchfield | Adam Poliak | Cash Costello | Tim Finin | Scott Miller | James Mayfield | Philipp Koehn | Craig Harman | Dawn Lawrie | Chandler May | Max Thomas | Annabelle Carrell | Julianne Chaloux | Tongfei Chen | Alex Comerford | Mark Dredze | Benjamin Glass | Shudong Hao | Patrick Martin | Pushpendre Rastogi | Rashmi Sankepally | Travis Wolfe | Ying-Ying Tran | Ted Zhang

Computer Assisted Discovery Extraction and Translation (CADET) is a workbench for helping knowledge workers find, label, and translate documents of interest. It combines a multitude of analytics together with a flexible environment for customizing the workflow for different users. This open-source framework allows for easy development of new research prototypes using a micro-service architecture based atop Docker and Apache Thrift.

pdf bib
WiseReporter: A Korean Report Generation System
Yunseok Noh | Su Jeong Choi | Seong-Bae Park | Se-Young Park

We demonstrate a report generation system called WiseReporter. The WiseReporter generates a text report of a specific topic which is usually given as a keyword by verbalizing knowledge base facts involving the topic. This demonstration does not demonstate only the report itself, but also the processes how the sentences for the report are generated. We are planning to enhance WiseReporter in the future by adding data analysis based on deep learning architecture and text summarization.

pdf bib
Encyclolink: A Cross-Encyclopedia,Cross-language Article-Linking System and Web-based Search Interface
Yu-Chun Wang | Ka Ming Wong | Chun-Kai Wu | Chao-Lin Pan | Richard Tzong-Han Tsai

Cross-language article linking (CLAL) is the task of finding corresponding article pairs across encyclopedias of different languages. In this paper, we present Encyclolink, a web-based CLAL search interface designed to help users find equivalent encyclopedia articles in Baidu Baike for a given English Wikipedia article title query. Encyclolink is powered by our cross-encyclopedia entity embedding CLAL system (0.8 MRR). The browser-based Interface provides users with a clear and easily readable preview of the contents of retrieved articles for comparison.

pdf bib
A Telecom-Domain Online Customer Service Assistant Based on Question Answering with Word Embedding and Intent Classification
Jui-Yang Wang | Min-Feng Kuo | Jen-Chieh Han | Chao-Chuang Shih | Chun-Hsun Chen | Po-Ching Lee | Richard Tzong-Han Tsai

In the paper, we propose an information retrieval based (IR-based) Question Answering (QA) system to assist online customer service staffs respond users in the telecom domain. When user asks a question, the system retrieves a set of relevant answers and ranks them. Moreover, our system uses a novel reranker to enhance the ranking result of information retrieval. It employs the word2vec model to represent the sentences as vectors. It also uses a sub-category feature, predicted by the k-nearest neighbor algorithm. Finally, the system returns the top five candidate answers, making online staffs find answers much more efficiently.

pdf bib
TOTEMSS: Topic-based, Temporal Sentiment Summarisation for Twitter
Bo Wang | Maria Liakata | Adam Tsakalidis | Spiros Georgakopoulos Kolaitis | Symeon Papadopoulos | Lazaros Apostolidis | Arkaitz Zubiaga | Rob Procter | Yiannis Kompatsiaris

We present a system for time sensitive, topic based summarisation of the sentiment around target entities and topics in collections of tweets. We describe the main elements of the system and illustrate its functionality with two examples of sentiment analysis of topics related to the 2017 UK general election.

pdf bib
MUSST: A Multilingual Syntactic Simplification Tool
Carolina Scarton | Alessio Palmero Aprosio | Sara Tonelli | Tamara Martín Wanton | Lucia Specia

We describe MUSST, a multilingual syntactic simplification tool. The tool supports sentence simplifications for English, Italian and Spanish, and can be easily extended to other languages. Our implementation includes a set of general-purpose simplification rules, as well as a sentence selection module (to select sentences to be simplified) and a confidence model (to select only promising simplifications). The tool was implemented in the context of the European project SIMPATICO on text simplification for Public Administration (PA) texts. Our evaluation on sentences in the PA domain shows that we obtain correct simplifications for 76% of the simplified cases in English, 71% of the cases in Spanish. For Italian, the results are lower (38%) but the tool is still under development.

pdf bib
XMU Neural Machine Translation Online Service
Boli Wang | Zhixing Tan | Jinming Hu | Yidong Chen | Xiaodong Shi

We demonstrate a neural machine translation web service. Our NMT service provides web-based translation interfaces for a variety of language pairs. We describe the architecture of NMT runtime pipeline and the training details of NMT models. We also show several applications of our online translation interfaces.

pdf bib
Semantics-Enhanced Task-Oriented Dialogue Translation: A Case Study on Hotel Booking
Longyue Wang | Jinhua Du | Liangyou Li | Zhaopeng Tu | Andy Way | Qun Liu

We showcase TODAY, a semantics-enhanced task-oriented dialogue translation system, whose novelties are: (i) task-oriented named entity (NE) definition and a hybrid strategy for NE recognition and translation; and (ii) a novel grounded semantic method for dialogue understanding and task-order management. TODAY is a case-study demo which can efficiently and accurately assist customers and agents in different languages to reach an agreement in a dialogue for the hotel booking.

pdf bib
NNVLP: A Neural Network-Based Vietnamese Language Processing Toolkit
Thai-Hoang Pham | Xuan-Khoai Pham | Tuan-Anh Nguyen | Phuong Le-Hong

This paper demonstrates neural network-based toolkit namely NNVLP for essential Vietnamese language processing tasks including part-of-speech (POS) tagging, chunking, Named Entity Recognition (NER). Our toolkit is a combination of bidirectional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network (CNN), Conditional Random Field (CRF), using pre-trained word embeddings as input, which outperforms previously published toolkits on these three tasks. We provide both of API and web demo for this toolkit.

pdf bib
ClassifierGuesser: A Context-based Classifier Prediction System for Chinese Language Learners
Nicole Peinelt | Maria Liakata | Shu-Kai Hsieh

Classifiers are function words that are used to express quantities in Chinese and are especially difficult for language learners. In contrast to previous studies, we argue that the choice of classifiers is highly contextual and train context-aware machine learning models based on a novel publicly available dataset, outperforming previous baselines. We further present use cases for our database and models in an interactive demo system.

pdf bib
Automatic Difficulty Assessment for Chinese Texts
John Lee | Meichun Liu | Chun Yin Lam | Tak On Lau | Bing Li | Keying Li

We present a web-based interface that automatically assesses reading difficulty of Chinese texts. The system performs word segmentation, part-of-speech tagging and dependency parsing on the input text, and then determines the difficulty levels of the vocabulary items and grammatical constructions in the text. Furthermore, the system highlights the words and phrases that must be simplified or re-written in order to conform to the user-specified target difficulty level. Evaluation results show that the system accurately identifies the vocabulary level of 89.9% of the words, and detects grammar points at 0.79 precision and 0.83 recall.

pdf bib
Verb Replacer: An English Verb Error Correction System
Yu-Hsuan Wu | Jhih-Jie Chen | Jason Chang

According to the analysis of Cambridge Learner Corpus, using a wrong verb is the most common type of grammatical errors. This paper describes Verb Replacer, a system for detecting and correcting potential verb errors in a given sentence. In our approach, alternative verbs are considered to replace the verb based on an error-annotated corpus and verb-object collocations. The method involves applying regression on channel models, parsing the sentence, identifying the verbs, retrieving a small set of alternative verbs, and evaluating each alternative. Our method combines and improves channel and language models, resulting in high recall of detecting and correcting verb misuse.

pdf bib
Learning Synchronous Grammar Patterns for Assisted Writing for Second Language Learners
Chi-En Wu | Jhih-Jie Chen | Jim Chang | Jason Chang

In this paper, we present a method for extracting Synchronous Grammar Patterns (SGPs) from a given parallel corpus in order to assisted second language learners in writing. A grammar pattern consists of a head word (verb, noun, or adjective) and its syntactic environment. A synchronous grammar pattern describes a grammar pattern in the target language (e.g., English) and its counterpart in an other language (e.g., Mandarin), serving the purpose of native language support. Our method involves identifying the grammar patterns in the target language, aligning these patterns with the target language patterns, and finally filtering valid SGPs. The extracted SGPs with examples are then used to develop a prototype writing assistant system, called WriteAhead/bilingual. Evaluation on a set of randomly selected SGPs shows that our system provides satisfactory writing suggestions for English as a Second Language (ESL) learners.

pdf bib
Guess What: A Question Answering Game via On-demand Knowledge Validation
Yu-Sheng Li | Chien-Hui Tseng | Chian-Yun Huang | Wei-Yun Ma

In this paper, we propose an idea of ondemand knowledge validation and fulfill the idea through an interactive Question-Answering (QA) game system, which is named Guess What. An object (e.g. dog) is first randomly chosen by the system, and then a user can repeatedly ask the system questions in natural language to guess what the object is. The system would respond with yes/no along with a confidence score. Some useful hints can also be given if needed. The proposed framework provides a pioneering example of on-demand knowledge validation in dialog environment to address such needs in AI agents/chatbots. Moreover, the released log data that the system gathered can be used to identify the most critical concepts/attributes of an existing knowledge base, which reflects human’s cognition about the world.

pdf bib
STCP: Simplified-Traditional Chinese Conversion and Proofreading
Jiarui Xu | Xuezhe Ma | Chen-Tse Tsai | Eduard Hovy

This paper aims to provide an effective tool for conversion between Simplified Chinese and Traditional Chinese. We present STCP, a customizable system comprising statistical conversion model, and proofreading web interface. Experiments show that our system achieves comparable character-level conversion performance with the state-of-art systems. In addition, our proofreading interface can effectively support diagnostics and data annotation. STCP is available at

pdf bib
Deep Neural Network based system for solving Arithmetic Word problems
Purvanshi Mehta | Pruthwik Mishra | Vinayak Athavale | Manish Shrivastava | Dipti Sharma

This paper presents DILTON a system which solves simple arithmetic word problems. DILTON uses a Deep Neural based model to solve math word problems. DILTON divides the question into two parts - worldstate and query. The worldstate and the query are processed separately in two different networks and finally, the networks are merged to predict the final operation. We report the first deep learning approach for the prediction of operation between two numbers. DILTON learns to predict operations with 88.81% accuracy in a corpus of primary school questions.