Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

Hideo Watanabe (Editor)

Anthology ID:
Osaka, Japan
The COLING 2016 Organizing Committee
Bib Export formats:

pdf bib
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations
Hideo Watanabe

pdf bib
An Interactive System for Exploring Community Question Answering Forums
Enamul Hoque | Shafiq Joty | Lluís Màrquez | Alberto Barrón-Cedeño | Giovanni Da San Martino | Alessandro Moschitti | Preslav Nakov | Salvatore Romeo | Giuseppe Carenini

We present an interactive system to provide effective and efficient search capabilities in Community Question Answering (cQA) forums. The system integrates state-of-the-art technology for answer search with a Web-based user interface specifically tailored to support the cQA forum readers. The answer search module automatically finds relevant answers for a new question by exploring related questions and the comments within their threads. The graphical user interface presents the search results and supports the exploration of related information. The system is running live at

pdf bib
NLmaps: A Natural Language Interface to Query OpenStreetMap
Carolin Lawrence | Stefan Riezler

We present a Natural Language Interface ( to query OpenStreetMap. Natural language questions about geographical facts are parsed into database queries that can be executed against the OpenStreetMap (OSM) database. After parsing the question, the system provides a text based answer as well as an interactive map with all points of interest and their relevant information marked. Additionally, we provide several options for users to give feedback after a question has been parsed.

pdf bib
A Reading Environment for Learners of Chinese as a Foreign Language
John Lee | Chun Yin Lam | Shu Jiang

We present a mobile app that provides a reading environment for learners of Chinese as a foreign language. The app includes a text database that offers over 500K articles from Chinese Wikipedia. These articles have been word-segmented; each word is linked to its entry in a Chinese-English dictionary, and to automatically-generated review exercises. The app estimates the reading proficiency of the user based on a “to-learn” list of vocabulary items. It automatically constructs and maintains this list by tracking the user’s dictionary lookup behavior and performance in review exercises. When a user searches for articles to read, search results are filtered such that the proportion of unknown words does not exceed a user-specified threshold.

pdf bib
A Post-editing Interface for Immediate Adaptation in Statistical Machine Translation
Patrick Simianer | Sariya Karimova | Stefan Riezler

Adaptive machine translation (MT) systems are a promising approach for improving the effectiveness of computer-aided translation (CAT) environments. There is, however, virtually only theoretical work that examines how such a system could be implemented. We present an open source post-editing interface for adaptive statistical MT, which has in-depth monitoring capabilities and excellent expandability, and can facilitate practical studies. To this end, we designed text-based and graphical post-editing interfaces. The graphical interface offers means for displaying and editing a rich view of the MT output. Our translation systems may learn from post-edits using several weight, language model and novel translation model adaptation techniques, in part by exploiting the output of the graphical interface. In a user study we show that using the proposed interface and adaptation methods, reductions in technical effort and time can be achieved.

pdf bib
Word Midas Powered by StringNet: Discovering Lexicogrammatical Constructions in Situ
David Wible | Nai-Lung Tsao

Adult second language learners face the daunting but underappreciated task of mastering patterns of language use that are neither products of fully productive grammar rules nor frozen items to be memorized. Word Midas, a web browser extention, targets this uncharted territory of lexicogrammar by detecting multiword tokens of lexicogrammatical patterning in real time in situ within the noisy digital texts from the user’s unscripted web browsing or other digital venues. The language model powering Word Midas is StringNet, a densely cross-indexed navigable network of one billion lexicogrammatical patterns of English. These resources are described and their functionality is illustrated with a detailed scenario.

pdf bib
BonTen’ – Corpus Concordance System for ‘NINJAL Web Japanese Corpus’
Masayuki Asahara | Kazuya Kawahara | Yuya Takei | Hideto Masuoka | Yasuko Ohba | Yuki Torii | Toru Morii | Yuki Tanaka | Kikuo Maekawa | Sachi Kato | Hikari Konishi

The National Institute for Japanese Language and Linguistics, Japan (NINJAL) has undertaken a corpus compilation project to construct a web corpus for linguistic research comprising ten billion words. The project is divided into four parts: page collection, linguistic analysis, development of the corpus concordance system, and preservation. This article presents the corpus concordance system named ‘BonTen’ which enables the ten-billion-scaled corpus to be queried by string, a sequence of morphological information or a subtree of the syntactic dependency structure.

pdf bib
A Prototype Automatic Simultaneous Interpretation System
Xiaolin Wang | Andrew Finch | Masao Utiyama | Eiichiro Sumita

Simultaneous interpretation allows people to communicate spontaneously across language boundaries, but such services are prohibitively expensive for the general public. This paper presents a fully automatic simultaneous interpretation system to address this problem. Though the development is still at an early stage, the system is capable of keeping up with the fastest of the TED speakers while at the same time delivering high-quality translations. We believe that the system will become an effective tool for facilitating cross-lingual communication in the future.

pdf bib
MuTUAL: A Controlled Authoring Support System Enabling Contextual Machine Translation
Rei Miyata | Anthony Hartley | Kyo Kageura | Cécile Paris | Masao Utiyama | Eiichiro Sumita

The paper introduces a web-based authoring support system, MuTUAL, which aims to help writers create multilingual texts. The highlighted feature of the system is that it enables machine translation (MT) to generate outputs appropriate to their functional context within the target document. Our system is operational online, implementing core mechanisms for document structuring and controlled writing. These include a topic template and a controlled language authoring assistant, linked to our statistical MT system.

pdf bib
Joint search in a bilingual valency lexicon and an annotated corpus
Eva Fučíková | Jan Hajič | Zdeňka Urešová

In this paper and the associated system demo, we present an advanced search system that allows to perform a joint search over a (bilingual) valency lexicon and a correspondingly annotated linked parallel corpus. This search tool has been developed on the basis of the Prague Czech-English Dependency Treebank, but its ideas are applicable in principle to any bilingual parallel corpus that is annotated for dependencies and valency (i.e., predicate-argument structure), and where verbs are linked to appropriate entries in an associated valency lexicon. Our online search tool consolidates more search interfaces into one, providing expanded structured search capability and a more efficient advanced way to search, allowing users to search for verb pairs, verbal argument pairs, their surface realization as recorded in the lexicon, or for their surface form actually appearing in the linked parallel corpus. The search system is currently under development, and is replacing our current search tool available at, which could search the lexicon but the queries cannot take advantage of the underlying corpus nor use the additional surface form information from the lexicon(s). The system is available as open source.

pdf bib
Experiments in Candidate Phrase Selection for Financial Named Entity Extraction - A Demo
Aman Kumar | Hassan Alam | Tina Werner | Manan Vyas

In this study we develop a system that tags and extracts financial concepts called financial named entities (FNE) along with corresponding numeric values – monetary and temporal. We employ machine learning and natural language processing methods to identify financial concepts and dates, and link them to numerical entities.

pdf bib
Demonstration of ChaKi.NET – beyond the corpus search system
Masayuki Asahara | Yuji Matsumoto | Toshio Morita

ChaKi.NET is a corpus management system for dependency structure annotated corpora. After more than 10 years of continuous development, the system is now usable not only for corpus search, but also for visualization, annotation, labelling, and formatting for statistical analysis. This paper describes the various functions included in the current ChaKi.NET system.

pdf bib
VoxSim: A Visual Platform for Modeling Motion Language
Nikhil Krishnaswamy | James Pustejovsky

Much existing work in text-to-scene generation focuses on generating static scenes. By introducing a focus on motion verbs, we integrate dynamic semantics into a rich formal model of events to generate animations in real time that correlate with human conceptions of the event described. This paper presents a working system that generates these animated scenes over a test set, discussing challenges encountered and describing the solutions implemented.

pdf bib
TextImager: a Distributed UIMA-based System for NLP
Wahed Hemati | Tolga Uslu | Alexander Mehler

More and more disciplines require NLP tools for performing automatic text analyses on various levels of linguistic resolution. However, the usage of established NLP frameworks is often hampered for several reasons: in most cases, they require basic to sophisticated programming skills, interfere with interoperability due to using non-standard I/O-formats and often lack tools for visualizing computational results. This makes it difficult especially for humanities scholars to use such frameworks. In order to cope with these challenges, we present TextImager, a UIMA-based framework that offers a range of NLP and visualization tools by means of a user-friendly GUI. Using TextImager requires no programming skills.

pdf bib
DISCO: A System Leveraging Semantic Search in Document Review
Ngoc Phuoc An Vo | Fabien Guillot | Caroline Privault

This paper presents Disco, a prototype for supporting knowledge workers in exploring, reviewing and sorting collections of textual data. The goal is to facilitate, accelerate and improve the discovery of information. To this end, it combines Semantic Relatedness techniques with a review workflow developed in a tangible environment. Disco uses a semantic model that is leveraged on-line in the course of search sessions, and accessed through natural hand-gesture, in a simple and intuitive way.

pdf bib
pke: an open source python-based keyphrase extraction toolkit
Florian Boudin

We describe pke, an open source python-based keyphrase extraction toolkit. It provides an end-to-end keyphrase extraction pipeline in which each component can be easily modified or extented to develop new approaches. pke also allows for easy benchmarking of state-of-the-art keyphrase extraction approaches, and ships with supervised models trained on the SemEval-2010 dataset.

pdf bib
Langforia: Language Pipelines for Annotating Large Collections of Documents
Marcus Klang | Pierre Nugues

In this paper, we describe Langforia, a multilingual processing pipeline to annotate texts with multiple layers: formatting, parts of speech, named entities, dependencies, semantic roles, and entity links. Langforia works as a web service, where the server hosts the language processing components and the client, the input and result visualization. To annotate a text or a Wikipedia page, the user chooses an NLP pipeline and enters the text in the interface or selects the page URL. Once processed, the results are returned to the client, where the user can select the annotation layers s/he wants to visualize. We designed Langforia with a specific focus for Wikipedia, although it can process any type of text. Wikipedia has become an essential encyclopedic corpus used in many NLP projects. However, processing articles and visualizing the annotations are nontrivial tasks that require dealing with multiple markup variants, encodings issues, and tool incompatibilities across the language versions. This motivated the development of a new architecture. A demonstration of Langforia is available for six languages: English, French, German, Spanish, Russian, and Swedish at as well as a web API: Langforia is also provided as a standalone library and is compatible with cluster computing.

pdf bib
Anita: An Intelligent Text Adaptation Tool
Gustavo Paetzold | Lucia Specia

We introduce Anita: a flexible and intelligent Text Adaptation tool for web content that provides Text Simplification and Text Enhancement modules. Anita’s simplification module features a state-of-the-art system that adapts texts according to the needs of individual users, and its enhancement module allows the user to search for a word’s definitions, synonyms, translations, and visual cues through related images. These utilities are brought together in an easy-to-use interface of a freely available web browser extension.

pdf bib
HistoryComparator: Interactive Across-Time Comparison in Document Archives
Adam Jatowt | Marc Bron

Recent years have witnessed significant increase in the number of large scale digital collections of archival documents such as news articles, books, etc. Typically, users access these collections through searching or browsing. In this paper we investigate another way of accessing temporal collections - across-time comparison, i.e., comparing query-relevant information at different periods in the past. We propose an interactive framework called HistoryComparator for contrastively analyzing concepts in archival document collections at different time periods.

pdf bib
On-line Multilingual Linguistic Services
Eric Wehrli | Yves Scherrer | Luka Nerima

In this demo, we present our free on-line multilingual linguistic services which allow to analyze sentences or to extract collocations from a corpus directly on-line, or by uploading a corpus. They are available for 8 European languages (English, French, German, Greek, Italian, Portuguese, Romanian, Spanish) and can also be accessed as web services by programs. While several open systems are available for POS-tagging and dependency parsing or terminology extraction, their integration into an application requires some computational competence. Furthermore, none of the parsers/taggers handles MWEs very satisfactorily, in particular when the two terms of the collocation are distant from each other or in reverse order. Our tools, on the other hand, are specifically designed for users with no particular computational literacy. They do not require from the user any download, installation or adaptation if used on-line, and their integration in an application, using one the scripts described below is quite easy. Furthermore, by default, the parser handles collocations and other MWEs, as well as anaphora resolution (limited to 3rd person personal pronouns). When used in the tagger mode, it can be set to display grammatical functions and collocations.

pdf bib
A Customizable Editor for Text Simplification
John Lee | Wenlong Zhao | Wenxiu Xie

We present a browser-based editor for simplifying English text. Given an input sentence, the editor performs both syntactic and lexical simplification. It splits a complex sentence into shorter ones, and suggests word substitutions in drop-down lists. The user can choose the best substitution from the list, undo any inappropriate splitting, and further edit the sentence as necessary. A significant novelty is that the system accepts a customized vocabulary list for a target reader population. It identifies all words in the text that do not belong to the list, and attempts to substitute them with words from the list, thus producing a text tailored for the targeted readers.

pdf bib
CATaLog Online: A Web-based CAT Tool for Distributed Translation with Data Capture for APE and Translation Process Research
Santanu Pal | Sudip Kumar Naskar | Marcos Zampieri | Tapas Nayak | Josef van Genabith

We present a free web-based CAT tool called CATaLog Online which provides a novel and user-friendly online CAT environment for post-editors/translators. The goal is to support distributed translation, reduce post-editing time and effort, improve the post-editing experience and capture data for incremental MT/APE (automatic post-editing) and translation process research. The tool supports individual as well as batch mode file translation and provides translations from three engines – translation memory (TM), MT and APE. TM suggestions are color coded to accelerate the post-editing task. The users can integrate their personal TM/MT outputs. The tool remotely monitors and records post-editing activities generating an extensive range of post-editing logs.

pdf bib
Interactive Relation Extraction in Main Memory Database Systems
Rudolf Schneider | Cordula Guder | Torsten Kilias | Alexander Löser | Jens Graupmann | Oleksandr Kozachuk

We present INDREX-MM, a main memory database system for interactively executing two interwoven tasks, declarative relation extraction from text and their exploitation with SQL. INDREX-MM simplifies these tasks for the user with powerful SQL extensions for gathering statistical semantics, for executing open information extraction and for integrating relation candidates with domain specific data. We demonstrate these functions on 800k documents from Reuters RCV1 with more than a billion linguistic annotations and report execution times in the order of seconds.

pdf bib
An Open Source Library for Semantic-Based Datetime Resolution
Aurélie Merlo | Denis Pasin

In this paper, we introduce an original Python implementation of datetime resolution in french, which we make available as open-source library. Our approach is based on Frame Semantics and Corpus Pattern Analysis in order to provide a precise semantic interpretation of datetime expressions. This interpretation facilitates the contextual resolution of datetime expressions in timestamp format.

pdf bib
TASTY: Interactive Entity Linking As-You-Type
Sebastian Arnold | Robert Dziuba | Alexander Löser

We introduce TASTY (Tag-as-you-type), a novel text editor for interactive entity linking as part of the writing process. Tasty supports the author of a text with complementary information about the mentioned entities shown in a ‘live’ exploration view. The system is automatically triggered by keystrokes, recognizes mention boundaries and disambiguates the mentioned entities to Wikipedia articles. The author can use seven operators to interact with the editor and refine the results according to his specific intention while writing. Our implementation captures syntactic and semantic context using a robust end-to-end LSTM sequence learner and word embeddings. We demonstrate the applicability of our system in English and German language for encyclopedic or medical text. Tasty is currently being tested in interactive applications for text production, such as scientific research, news editorial, medical anamnesis, help desks and product reviews.

pdf bib
What topic do you want to hear about? A bilingual talking robot using English and Japanese Wikipedias
Graham Wilcock | Kristiina Jokinen | Seiichi Yamamoto

We demonstrate a bilingual robot application, WikiTalk, that can talk fluently in both English and Japanese about almost any topic using information from English and Japanese Wikipedias. The English version of the system has been demonstrated previously, but we now present a live demo with a Nao robot that speaks English and Japanese and switches language on request. The robot supports the verbal interaction with face-tracking, nodding and communicative gesturing. One of the key features of the WikiTalk system is that the robot can switch from the current topic to related topics during the interaction in order to navigate around Wikipedia following the user’s individual interests.

pdf bib
Annotating Discourse Relations with the PDTB Annotator
Alan Lee | Rashmi Prasad | Bonnie Webber | Aravind K. Joshi

The PDTB Annotator is a tool for annotating and adjudicating discourse relations based on the annotation framework of the Penn Discourse TreeBank (PDTB). This demo describes the benefits of using the PDTB Annotator, gives an overview of the PDTB Framework and discusses the tool’s features, setup requirements and how it can also be used for adjudication.

pdf bib
Opinion Retrieval Systems using Tweet-external Factors
Yoon-Sung Kim | Young-In Song | Hae-Chang Rim

Opinion mining is a natural language processing technique which extracts subjective information from natural language text. To estimate an opinion about a query in large data collection, an opinion retrieval system that retrieves subjective and relevant information about the query can be useful. We present an opinion retrieval system that retrieves subjective and query-relevant tweets from Twitter, which is a useful source of obtaining real-time opinions. Our system outperforms previous opinion retrieval systems, and it further provides subjective information about Twitter authors and hashtags to describe their subjective tendencies.

pdf bib
TextPro-AL: An Active Learning Platform for Flexible and Efficient Production of Training Data for NLP Tasks
Bernardo Magnini | Anne-Lyse Minard | Mohammed R. H. Qwaider | Manuela Speranza

This paper presents TextPro-AL (Active Learning for Text Processing), a platform where human annotators can efficiently work to produce high quality training data for new domains and new languages exploiting Active Learning methodologies. TextPro-AL is a web-based application integrating four components: a machine learning based NLP pipeline, an annotation editor for task definition and text annotations, an incremental re-training procedure based on active learning selection from a large pool of unannotated data, and a graphical visualization of the learning status of the system.

pdf bib
SideNoter: Scholarly Paper Browsing System based on PDF Restructuring and Text Annotation
Takeshi Abekawa | Akiko Aizawa

In this paper, we discuss our ongoing efforts to construct a scientific paper browsing system that helps users to read and understand advanced technical content distributed in PDF. Since PDF is a format specifically designed for printing, layout and logical structures of documents are indistinguishably embedded in the file. It requires much effort to extract natural language text from PDF files, and reversely, display semantic annotations produced by NLP tools on the original page layout. In our browsing system, we tackle these issues caused by the gap between printable document and plain text. Our system provides ways to extract natural language sentences from PDF files together with their logical structures, and also to map arbitrary textual spans to their corresponding regions on page images. We setup a demonstration system using papers published in ACL anthology and demonstrate the enhanced search and refined recommendation functions which we plan to make widely available to NLP researchers.

pdf bib
Sensing Emotions in Text Messages: An Application and Deployment Study of EmotionPush
Shih-Ming Wang | Chun-Hui Scott Lee | Yu-Chun Lo | Ting-Hao Huang | Lun-Wei Ku

Instant messaging and push notifications play important roles in modern digital life. To enable robust sense-making and rich context awareness in computer mediated communications, we introduce EmotionPush, a system that automatically conveys the emotion of received text with a colored push notification on mobile devices. EmotionPush is powered by state-of-the-art emotion classifiers and is deployed for Facebook Messenger clients on Android. The study showed that the system is able to help users prioritize interactions.

pdf bib
Illinois Cross-Lingual Wikifier: Grounding Entities in Many Languages to the English Wikipedia
Chen-Tse Tsai | Dan Roth

We release a cross-lingual wikification system for all languages in Wikipedia. Given a piece of text in any supported language, the system identifies names of people, locations, organizations, and grounds these names to the corresponding English Wikipedia entries. The system is based on two components: a cross-lingual named entity recognition (NER) model and a cross-lingual mention grounding model. The cross-lingual NER model is a language-independent model which can extract named entity mentions in the text of any language in Wikipedia. The extracted mentions are then grounded to the English Wikipedia using the cross-lingual mention grounding model. The only resources required to train the proposed system are the multilingual Wikipedia dump and existing training data for English NER. The system is online at

pdf bib
A Meaning-based English Math Word Problem Solver with Understanding, Reasoning and Explanation
Chao-Chun Liang | Shih-Hong Tsai | Ting-Yun Chang | Yi-Chung Lin | Keh-Yih Su

This paper presents a meaning-based statistical math word problem (MWP) solver with understanding, reasoning and explanation. It comprises a web user interface and pipelined modules for analysing the text, transforming both body and question parts into their logic forms, and then performing inference on them. The associated context of each quantity is represented with proposed role-tags (e.g., nsubj, verb, etc.), which provides the flexibility for annotating the extracted math quantity with its associated syntactic and semantic information (which specifies the physical meaning of that quantity). Those role-tags are then used to identify the desired operands and filter out irrelevant quantities (so that the answer can be obtained precisely). Since the physical meaning of each quantity is explicitly represented with those role-tags and used in the inference process, the proposed approach could explain how the answer is obtained in a human comprehensible way.

pdf bib
Valencer: an API to Query Valence Patterns in FrameNet
Alexandre Kabbach | Corentin Ribeyre

This paper introduces Valencer: a RESTful API to search for annotated sentences matching a given combination of syntactic realizations of the arguments of a predicate – also called ‘valence pattern’ – in the FrameNet database. The API takes as input an HTTP GET request specifying a valence pattern and outputs a list of exemplifying annotated sentences in JSON format. The API is designed to be modular and language-independent, and can therefore be easily integrated to other (NLP) server-side or client-side applications, as well as non-English FrameNet projects. Valencer is free, open-source, and licensed under the MIT license.

pdf bib
The Open Framework for Developing Knowledge Base And Question Answering System
Jiseong Kim | GyuHyeon Choi | Jung-Uk Kim | Eun-Kyung Kim | Key-Sun Choi

Developing a question answering (QA) system is a task of implementing and integrating modules of different technologies and evaluating an integrated whole system, which inevitably goes with a collaboration among experts of different domains. For supporting a easy collaboration, this demonstration presents the open framework that aims to support developing a QA system in collaborative and intuitive ways. The demonstration also shows the QA system developed by our novel framework.

pdf bib
Linggle Knows: A Search Engine Tells How People Write
Jhih-Jie Chen | Hao-Chun Peng | Mei-Cih Yeh | Peng-Yu Chen | Jason Chang

This paper shows the great potential of incorporating different approaches to help writing. Not only did they solve different kinds of writing problems, but also they complement and reinforce each other to be a complete and effective solution. Despite the extensive and multifaceted feedback and suggestion, writing is not all about syntactically or lexically well-written. It involves contents, structure, the certain understanding of the background, and many other factors to compose a rich, organized and sophisticated text. (e.g., conventional structure and idioms in academic writing). There is still a long way to go to accomplish the ultimate goal. We envision the future of writing to be a joyful experience with the help of instantaneous suggestion and constructive feedback.

pdf bib
A Sentence Simplification System for Improving Relation Extraction
Christina Niklaus | Bernhard Bermeitinger | Siegfried Handschuh | André Freitas

We present a text simplification approach that is directed at improving the performance of state-of-the-art Open Relation Extraction (RE) systems. As syntactically complex sentences often pose a challenge for current Open RE approaches, we have developed a simplification framework that performs a pre-processing step by taking a single sentence as input and using a set of syntactic-based transformation rules to create a textual input that is easier to process for subsequently applied Open RE systems.

pdf bib
Korean FrameNet Expansion Based on Projection of Japanese FrameNet
Jeong-uk Kim | Younggyun Hahm | Key-Sun Choi

FrameNet project has begun from Berkeley in 1997, and is now supported in several countries reflecting characteristics of each language. The work for generating Korean FrameNet was already done by converting annotated English sentences into Korean with trained translators. However, high cost of frame-preservation and error revision was a huge burden on further expansion of FrameNet. This study makes use of linguistic similarity between Japanese and Korean to increase Korean FrameNet corpus with low cost. We also suggest adapting PubAnnotation and Korean-friendly valence patterns to FrameNet for increased accessibility.

pdf bib
A Framework for Mining Enterprise Risk and Risk Factors from News Documents
Tirthankar Dasgupta | Lipika Dey | Prasenjit Dey | Rupsa Saha

Any real world events or trends that can affect the company’s growth trajectory can be considered as risk. There has been a growing need to automatically identify, extract and analyze risk related statements from news events. In this demonstration, we will present a risk analytics framework that processes enterprise project management reports in the form of textual data and news documents and classify them into valid and invalid risk categories. The framework also extracts information from the text pertaining to the different categories of risks like their possible cause and impacts. Accordingly, we have used machine learning based techniques and studied different linguistic features like n-gram, POS, dependency, future timing, uncertainty factors in texts and their various combinations. A manual annotation study from management experts using risk descriptions collected for a specific organization was conducted to evaluate the framework. The evaluation showed promising results for automated risk analysis and identification.

pdf bib
papago: A Machine Translation Service with Word Sense Disambiguation and Currency Conversion
Hyoung-Gyu Lee | Jun-Seok Kim | Joong-Hwi Shin | Jaesong Lee | Ying-Xiu Quan | Young-Seob Jeong

In this paper, we introduce papago - a translator for mobile device which is equipped with new features that can provide convenience for users. The first feature is word sense disambiguation based on user feedback. By using the feature, users can select one among multiple meanings of a homograph and obtain the corrected translation with the user-selected sense. The second feature is the instant currency conversion of money expressions contained in a translation result with current exchange rate. Users can be quickly and precisely provided the amount of money converted as local currency when they travel abroad.

pdf bib
TopoText: Interactive Digital Mapping of Literary Text
Randa El Khatib | Julia El Zini | David Wrisley | Mohamad Jaber | Shady Elbassuoni

We demonstrate TopoText, an interactive tool for digital mapping of literary text. TopoText takes as input a literary piece of text such as a novel or a biography article and automatically extracts all place names in the text. The identified places are then geoparsed and displayed on an interactive map. TopoText calculates the number of times a place was mentioned in the text, which is then reflected on the map allowing the end-user to grasp the importance of the different places within the text. It also displays the most frequent words mentioned within a specified proximity of a place name in context or across the entire text. This can also be faceted according to part of speech tags. Finally, TopoText keeps the human in the loop by allowing the end-user to disambiguate places and to provide specific place annotations. All extracted information such as geolocations, place frequencies, as well as all user-provided annotations can be automatically exported as a CSV file that can be imported later by the same user or other users.

pdf bib
ACE: Automatic Colloquialism, Typographical and Orthographic Errors Detection for Chinese Language
Shichao Dong | Gabriel Pui Cheong Fung | Binyang Li | Baolin Peng | Ming Liao | Jia Zhu | Kam-fai Wong

We present a system called ACE for Automatic Colloquialism and Errors detection for written Chinese. ACE is based on the combination of N-gram model and rule-base model. Although it focuses on detecting colloquial Cantonese (a dialect of Chinese) at the current stage, it can be extended to detect other dialects. We chose Cantonese becauase it has many interesting properties, such as unique grammar system and huge colloquial terms, that turn the detection task extremely challenging. We conducted experiments using real data and synthetic data. The results indicated that ACE is highly reliable and effective.

pdf bib
A Tool for Efficient Content Compilation
Boris Galitsky

We build a tool to assist in content creation by mining the web for information relevant to a given topic. This tool imitates the process of essay writing by humans: searching for topics on the web, selecting content frag-ments from the found document, and then compiling these fragments to obtain a coherent text. The process of writing starts with automated building of a table of content by obtaining the list of key entities for the given topic extracted from web resources such as Wikipedia. Once a table of content is formed, each item forms a seed for web mining. The tool builds a full-featured structured Word document with table of content, section structure, images and captions and web references for all mined text fragments. Two linguistic technologies are employed: for relevance verification, we use similarity computed as a tree similarity between parse trees for a seed and candidate text fragment. For text coherence, we use a measure of agreement between a given and consecutive paragraph by tree kernel learning of their discourse trees. The tool is available at

pdf bib
MAGES: A Multilingual Angle-integrated Grouping-based Entity Summarization System
Eun-kyung Kim | Key-Sun Choi

This demo presents MAGES (multilingual angle-integrated grouping-based entity summarization), an entity summarization system for a large knowledge base such as DBpedia based on a entity-group-bound ranking in a single integrated entity space across multiple language-specific editions. MAGES offers a multilingual angle-integrated space model, which has the advantage of overcoming missing semantic tags (i.e., categories) caused by biases in different language communities, and can contribute to the creation of entity groups that are well-formed and more stable than the monolingual condition within it. MAGES can help people quickly identify the essential points of the entities when they search or browse a large volume of entity-centric data. Evaluation results on the same experimental data demonstrate that our system produces a better summary compared with other representative DBpedia entity summarization methods.

pdf bib
Botta: An Arabic Dialect Chatbot
Dana Abu Ali | Nizar Habash

This paper presents BOTTA, the first Arabic dialect chatbot. We explore the challenges of creating a conversational agent that aims to simulate friendly conversations using the Egyptian Arabic dialect. We present a number of solutions and describe the different components of the BOTTA chatbot. The BOTTA database files are publicly available for researchers working on Arabic chatbot technologies. The BOTTA chatbot is also publicly available for any users who want to chat with it online.

pdf bib
What’s up on Twitter? Catch up with TWIST!
Marina Litvak | Natalia Vanetik | Efi Levi | Michael Roistacher

Event detection and analysis with respect to public opinions and sentiments in social media is a broad and well-addressed research topic. However, the characteristics and sheer volume of noisy Twitter messages make this a difficult task. This demonstration paper describes a TWItter event Summarizer and Trend detector (TWIST) system for event detection, visualization, textual description, and geo-sentiment analysis of real-life events reported in Twitter.

pdf bib
Praat on the Web: An Upgrade of Praat for Semi-Automatic Speech Annotation
Mónica Domínguez | Iván Latorre | Mireia Farrús | Joan Codina-Filbà | Leo Wanner

This paper presents an implementation of the widely used speech analysis tool Praat as a web application with an extended functionality for feature annotation. In particular, Praat on the Web addresses some of the central limitations of the original Praat tool and provides (i) enhanced visualization of annotations in a dedicated window for feature annotation at interval and point segments, (ii) a dynamic scripting composition exemplified with a modular prosody tagger, and (iii) portability and an operational web interface. Speech annotation tools with such a functionality are key for exploring large corpora and designing modular pipelines.

pdf bib
YAMAMA: Yet Another Multi-Dialect Arabic Morphological Analyzer
Salam Khalifa | Nasser Zalmout | Nizar Habash

In this paper, we present YAMAMA, a multi-dialect Arabic morphological analyzer and disambiguator. Our system is almost five times faster than the state-of-art MADAMIRA system with a slightly lower quality. In addition to speed, YAMAMA outputs a rich representation which allows for a wider spectrum of use. In this regard, YAMAMA transcends other systems, such as FARASA, which is faster but provides specific outputs catering to specific applications.

pdf bib
CamelParser: A system for Arabic Syntactic Analysis and Morphological Disambiguation
Anas Shahrour | Salam Khalifa | Dima Taji | Nizar Habash

In this paper, we present CamelParser, a state-of-the-art system for Arabic syntactic dependency analysis aligned with contextually disambiguated morphological features. CamelParser uses a state-of-the-art morphological disambiguator and improves its results using syntactically driven features. The system offers a number of output formats that include basic dependency with morphological features, two tree visualization modes, and traditional Arabic grammatical analysis.

pdf bib
Demonstrating Ambient Search: Implicit Document Retrieval for Speech Streams
Benjamin Milde | Jonas Wacker | Stefan Radomski | Max Mühlhäuser | Chris Biemann

In this demonstration paper we describe Ambient Search, a system that displays and retrieves documents in real time based on speech input. The system operates continuously in ambient mode, i.e. it generates speech transcriptions and identifies main keywords and keyphrases, while also querying its index to display relevant documents without explicit query. Without user intervention, the results are dynamically updated; users can choose to interact with the system at any time, employing a conversation protocol that is enriched with the ambient information gathered continuously. Our evaluation shows that Ambient Search outperforms another implicit speech-based information retrieval system. Ambient search is available as open source software.

pdf bib
ConFarm: Extracting Surface Representations of Verb and Noun Constructions from Dependency Annotated Corpora of Russian
Nikita Mediankin

ConFarm is a web service dedicated to extraction of surface representations of verb and noun constructions from dependency annotated corpora of Russian texts. Currently, the extraction of constructions with a specific lemma from SynTagRus and Russian National Corpus is available. The system provides flexible interface that allows users to fine-tune the output. Extracted constructions are grouped by their contents to allow for compact representation, and the groups are visualised as a graph in order to help navigating the extraction results. ConFarm differs from similar existing tools for Russian language in that it offers full constructions, as opposed to extracting separate dependents of search word or working with collocations, and allows users to discover unexpected constructions as opposed to searching for examples of a user-defined construction.

pdf bib
ENIAM: Categorial Syntactic-Semantic Parser for Polish
Wojciech Jaworski | Jakub Kozakoszczak

This paper presents ENIAM, the first syntactic and semantic parser that generates semantic representations for sentences in Polish. The parser processes non-annotated data and performs tokenization, lemmatization, dependency recognition, word sense annotation, thematic role annotation, partial disambiguation and computes the semantic representation.

pdf bib
Towards Non-projective High-Order Dependency Parser
Wenjing Fang | Kenny Zhu | Yizhong Wang | Jia Tan

This paper presents a novel high-order dependency parsing framework that targets non-projective treebanks. It imitates how a human parses sentences in an intuitive way. At every step of the parse, it determines which word is the easiest to process among all the remaining words, identifies its head word and then folds it under the head word. Further, this work is flexible enough to be augmented with other parsing techniques.

pdf bib
Using Synthetically Collected Scripts for Story Generation
Takashi Ogata | Tatsuya Arai | Jumpei Ono

A script is a type of knowledge representation in artificial intelligence (AI). This paper presents two methods for synthetically using collected scripts for story generation. The first method recursively generates long sequences of events and the second creates script networks. Although related studies generally use one or more scripts for story generation, this research synthetically uses many scripts to flexibly generate a diverse narrative.

pdf bib
CaseSummarizer: A System for Automated Summarization of Legal Texts
Seth Polsley | Pooja Jhunjhunwala | Ruihong Huang

Attorneys, judges, and others in the justice system are constantly surrounded by large amounts of legal text, which can be difficult to manage across many cases. We present CaseSummarizer, a tool for automated text summarization of legal documents which uses standard summary methods based on word frequency augmented with additional domain-specific knowledge. Summaries are then provided through an informative interface with abbreviations, significance heat maps, and other flexible controls. It is evaluated using ROUGE and human scoring against several other summarization systems, including summary text and feedback provided by domain experts.

pdf bib
WISDOM X, DISAANA and D-SUMM: Large-scale NLP Systems for Analyzing Textual Big Data
Junta Mizuno | Masahiro Tanaka | Kiyonori Ohtake | Jong-Hoon Oh | Julien Kloetzer | Chikara Hashimoto | Kentaro Torisawa

We demonstrate our large-scale NLP systems: WISDOM X, DISAANA, and D-SUMM. WISDOM X provides numerous possible answers including unpredictable ones to widely diverse natural language questions to provide deep insights about a broad range of issues. DISAANA and D-SUMM enable us to assess the damage caused by large-scale disasters in real time using Twitter as an information source.

pdf bib
Multilingual Information Extraction with PolyglotIE
Alan Akbik | Laura Chiticariu | Marina Danilevsky | Yonas Kbrom | Yunyao Li | Huaiyu Zhu

We present PolyglotIE, a web-based tool for developing extractors that perform Information Extraction (IE) over multilingual data. Our tool has two core features: First, it allows users to develop extractors against a unified abstraction that is shared across a large set of natural languages. This means that an extractor needs only be created once for one language, but will then run on multilingual data without any additional effort or language-specific knowledge on part of the user. Second, it embeds this abstraction as a set of views within a declarative IE system, allowing users to quickly create extractors using a mature IE query language. We present PolyglotIE as a hands-on demo in which users can experiment with creating extractors, execute them on multilingual text and inspect extraction results. Using the UI, we discuss the challenges and potential of using unified, crosslingual semantic abstractions as basis for downstream applications. We demonstrate multilingual IE for 9 languages from 4 different language groups: English, German, French, Spanish, Japanese, Chinese, Arabic, Russian and Hindi.

pdf bib
WordForce: Visualizing Controversial Words in Debates
Wei-Fan Chen | Fang-Yu Lin | Lun-Wei Ku

This paper presents WordForce, a system powered by the state of the art neural network model to visualize the learned user-dependent word embeddings from each post according to the post content and its engaged users. It generates the scatter plots to show the force of a word, i.e., whether the semantics of word embeddings from posts of different stances are clearly separated from the aspect of this controversial word. In addition, WordForce provides the dispersion and the distance of word embeddings from posts of different stance groups, and proposes the most controversial words accordingly to show clues to what people argue about in a debate.

pdf bib
Zara: A Virtual Interactive Dialogue System Incorporating Emotion, Sentiment and Personality Recognition
Pascale Fung | Anik Dey | Farhad Bin Siddique | Ruixi Lin | Yang Yang | Dario Bertero | Yan Wan | Ricky Ho Yin Chan | Chien-Sheng Wu

Zara, or ‘Zara the Supergirl’ is a virtual robot, that can exhibit empathy while interacting with an user, with the aid of its built in facial and emotion recognition, sentiment analysis, and speech module. At the end of the 5-10 minute conversation, Zara can give a personality analysis of the user based on all the user utterances. We have also implemented a real-time emotion recognition, using a CNN model that detects emotion from raw audio without feature extraction, and have achieved an average of 65.7% accuracy on six different emotion classes, which is an impressive 4.5% improvement from the conventional feature based SVM classification. Also, we have described a CNN based sentiment analysis module trained using out-of-domain data, that recognizes sentiment from the speech recognition transcript, which has a 74.8 F-measure when tested on human-machine dialogues.

pdf bib
NL2KB: Resolving Vocabulary Gap between Natural Language and Knowledge Base in Knowledge Base Construction and Retrieval
Sheng-Lun Wei | Yen-Pin Chiu | Hen-Hsen Huang | Hsin-Hsi Chen

Words to express relations in natural language (NL) statements may be different from those to represent properties in knowledge bases (KB). The vocabulary gap becomes barriers for knowledge base construction and retrieval. With the demo system called NL2KB in this paper, users can browse which properties in KB side may be mapped to for a given relational pattern in NL side. Besides, they can retrieve the sets of relational patterns in NL side for a given property in KB side. We describe how the mapping is established in detail. Although the mined patterns are used for Chinese knowledge base applications, the methodology can be extended to other languages.

pdf bib
PKUSUMSUM : A Java Platform for Multilingual Document Summarization
Jianmin Zhang | Tianming Wang | Xiaojun Wan

PKUSUMSUM is a Java platform for multilingual document summarization, and it sup-ports multiple languages, integrates 10 automatic summarization methods, and tackles three typical summarization tasks. The summarization platform has been released and users can easily use and update it. In this paper, we make a brief description of the char-acteristics, the summarization methods, and the evaluation results of the platform, and al-so compare PKUSUMSUM with other summarization toolkits.

pdf bib
Kotonush: Understanding Concepts Based on Values behind Social Media
Tatsuya Iwanari | Kohei Ohara | Naoki Yoshinaga | Nobuhiro Kaji | Masashi Toyoda | Masaru Kitsuregawa

Kotonush, a system that clarifies people’s values on various concepts on the basis of what they write about on social media, is presented. The values are represented by ordering sets of concepts (e.g., London, Berlin, and Rome) in accordance with a common attribute intensity expressed by an adjective (e.g., entertaining). We exploit social media text written by different demographics and at different times in order to induce specific orderings for comparison. The system combines a text-to-ordering module with an interactive querying interface enabled by massive hyponymy relations and provides mechanisms to compare the induced orderings from various viewpoints. We empirically evaluate Kotonush and present some case studies, featuring real-world concept orderings with different domains on Twitter, to demonstrate the usefulness of our system.

pdf bib
Exploring a Continuous and Flexible Representation of the Lexicon
Pierre Marchal | Thierry Poibeau

We aim at showing that lexical descriptions based on multifactorial and continuous models can be used by linguists and lexicographers (and not only by machines) so long as they are provided with a way to efficiently navigate data collections. We propose to demonstrate such a system.

pdf bib
Automatically Suggesting Example Sentences of Near-Synonyms for Language Learners
Chieh-Yang Huang | Nicole Peinelt | Lun-Wei Ku

In this paper, we propose GiveMeExample that ranks example sentences according to their capacity of demonstrating the differences among English and Chinese near-synonyms for language learners. The difficulty of the example sentences is automatically detected. Furthermore, the usage models of the near-synonyms are built by the GMM and Bi-LSTM models to suggest the best elaborative sentences. Experiments show the good performance both in the fill-in-the-blank test and on the manually labeled gold data, that is, the built models can select the appropriate words for the given context and vice versa.

pdf bib
Kyoto-NMT: a Neural Machine Translation implementation in Chainer
Fabien Cromières

We present Kyoto-NMT, an open-source implementation of the Neural Machine Translation paradigm. This implementation is done in Python and Chainer, an easy-to-use Deep Learning Framework.