Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems

Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems Emily Bender Hal Daumé III Allyson Ettinger Sudha Rao September 2017

Copenhagen, Denmark

Association for Computational Linguistics http://www.aclweb.org/anthology/W17-54 book BLGNLP2017:2017 Towards Linguistically Generalizable NLP Systems: A Workshop and Shared Task AllysonEttinger SudhaRao HalDaumé III Emily M.Bender Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems September 2017

Copenhagen, Denmark

Association for Computational Linguistics 1–10 http://www.aclweb.org/anthology/W17-5401 This paper presents a summary of the first Workshop on Building Linguistically Generalizable Natural Language Processing Systems, and the associated Build It Break It, The Language Edition shared task. The goal of this workshop was to bring together researchers in NLP and linguistics with a carefully designed shared task aimed at testing the generalizability of NLP systems beyond the distributions of their training data. We describe the motivation, setup, and participation of the shared task, provide discussion of some highlighted results, and discuss lessons learned. inproceedings ettinger-EtAl:2017:BLGNLP2017 Analysing Errors of Open Information Extraction Systems RudolfSchneider TomOberhauser TobiasKlatt Felix A.Gers AlexanderLöser Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems September 2017

Copenhagen, Denmark

Association for Computational Linguistics 11–18 http://www.aclweb.org/anthology/W17-5402 We report results on benchmarking Open Information Extraction (OIE) systems using RelVis, a toolkit for benchmarking Open Information Extraction systems. Our comprehensive benchmark contains three data sets from the news domain and one data set from Wikipedia with overall 4522 labeled sentences and 11243 binary or n-ary OIE relations. In our analysis on these data sets we compared the performance of four popular OIE systems, ClausIE, OpenIE 4.2, Stanford OpenIE and PredPatt. In addition, we evaluated the impact of five common error classes on a subset of 749 n-ary tuples. From our deep analysis we unreveal important research directions for a next generation on OIE systems. inproceedings schneider-EtAl:2017:BLGNLP2017 Massively Multilingual Neural Grapheme-to-Phoneme Conversion BenPeters JonDehdari Josefvan Genabith Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems September 2017

Copenhagen, Denmark

Association for Computational Linguistics 19–26 http://www.aclweb.org/anthology/W17-5403 Grapheme-to-phoneme conversion (g2p) is necessary for text-to-speech and automatic speech recognition systems. Most g2p systems are monolingual: they require language-specific data or handcrafting of rules. Such systems are difficult to extend to low resource languages, for which data and handcrafted rules are not available. As an alternative, we present a neural sequence-to-sequence approach to g2p which is trained on spelling–pronunciation pairs in hundreds of languages. The system shares a single encoder and decoder across all languages, allowing it to utilize the intrinsic similarities between different writing systems. We show an 11% improvement in phoneme error rate over an approach based on adapting high-resource monolingual g2p models to low-resource languages. Our model is also much more compact relative to previous approaches. inproceedings peters-dehdari-vangenabith:2017:BLGNLP2017 BIBI System Description: Building with CNNs and Breaking with Deep Reinforcement Learning YitongLi TrevorCohn TimothyBaldwin Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems September 2017

Copenhagen, Denmark

Association for Computational Linguistics 27–32 http://www.aclweb.org/anthology/W17-5404 This paper describes our submission to the sentiment analysis sub-task of “Build It, Break It: The Language Edition (BIBI)”, on both the builder and breaker sides. As a builder, we use convolutional neural nets, trained on both phrase and sentence data. As a breaker, we use Q-learning to learn minimal change pairs, and apply a token substitution method automatically. We analyse the results to gauge the robustness of NLP systems. inproceedings li-cohn-baldwin:2017:BLGNLP2017 Breaking NLP: Using Morphosyntax, Semantics, Pragmatics and World Knowledge to Fool Sentiment Analysis Systems TaylorMahler WillyCheung MichaElsner DavidKing Marie-Catherinede Marneffe CoryShain SymonStevens-Guille MichaelWhite Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems September 2017

Copenhagen, Denmark

Association for Computational Linguistics 33–39 http://www.aclweb.org/anthology/W17-5405 This paper describes our "breaker" submission to the 2017 EMNLP "Build It Break It" shared task on sentiment analysis. In order to cause the "builder" systems to make incorrect predictions, we edited items in the blind test data according to linguistically interpretable strategies that allow us to assess the ease with which the builder systems learn various components of linguistic structure. On the whole, our submitted pairs break all systems at a high rate (72.6%), indicating that sentiment analysis as an NLP task may still have a lot of ground to cover. Of the breaker strategies that we consider, we find our semantic and pragmatic manipulations to pose the most substantial difficulties for the builder systems. inproceedings mahler-EtAl:2017:BLGNLP2017 An Adaptable Lexical Simplification Architecture for Major Ibero-Romance Languages DanielFerrés HoracioSaggion XavierGómez Guinovart Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems September 2017

Copenhagen, Denmark

Association for Computational Linguistics 40–47 http://www.aclweb.org/anthology/W17-5406 Lexical Simplification is the task of reducing the lexical complexity of textual documents by replacing difficult words with easier to read (or understand) expressions while preserving the original meaning. The development of robust pipelined multilingual architectures able to adapt to new languages is of paramount importance in lexical simplification. This paper describes and evaluates a modular hybrid linguistic-statistical Lexical Simplifier that deals with the four major Ibero-Romance Languages: Spanish, Portuguese, Catalan, and Galician. The architecture of the system is the same for the four languages addressed, only the language resources used during simplification are language specific. inproceedings ferres-saggion-gomezguinovart:2017:BLGNLP2017 Cross-genre Document Retrieval: Matching between Conversational and Formal Writings TomaszJurczyk Jinho D.Choi Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems September 2017

Copenhagen, Denmark

Association for Computational Linguistics 48–53 http://www.aclweb.org/anthology/W17-5407 This paper challenges a cross-genre document retrieval task, where the queries are in formal writing and the target documents are in conversational writing. In this task, a query, is a sentence extracted from either a summary or a plot of an episode in a TV show, and the target document consists of transcripts from the corresponding episode. To establish a strong baseline, we employ the current state-of-the-art search engine to perform document retrieval on the dataset collected for this work. We then introduce a structure reranking approach to improve the initial ranking by utilizing syntactic and semantic structures generated by NLP tools. Our evaluation shows an improvement of more than 4% when the structure reranking is applied, which is very promising. inproceedings jurczyk-choi:2017:BLGNLP2017 ACTSA: Annotated Corpus for Telugu Sentiment Analysis Sandeep SricharanMukku RadhikaMamidi Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems September 2017

Copenhagen, Denmark

Association for Computational Linguistics 54–58 http://www.aclweb.org/anthology/W17-5408 Sentiment analysis deals with the task of determining the polarity of a document or sentence and has received a lot of attention in recent years for the English language. With the rapid growth of social media these days, a lot of data is available in regional languages besides English. Telugu is one such regional language with abundant data available in social media, but it’s hard to find a labelled data of sentences for Telugu Sentiment Analysis. In this paper, we describe an effort to build a gold-standard annotated corpus of Telugu sentences to support Telugu Sentiment Analysis. The corpus, named ACTSA (Annotated Corpus for Telugu Sentiment Analysis) has a collection of Telugu sentences taken from different sources which were then pre-processed and manually annotated by native Telugu speakers using our annotation guidelines. In total, we have annotated 5457 sentences, which makes our corpus the largest resource currently available. The corpus and the annotation guidelines are made publicly available. inproceedings mukku-mamidi:2017:BLGNLP2017 Strawman: An Ensemble of Deep Bag-of-Ngrams for Sentiment Analysis KyunghyunCho Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems September 2017

Copenhagen, Denmark

Association for Computational Linguistics 59–60 http://www.aclweb.org/anthology/W17-5409 This paper describes a builder entry, named "strawman", to the sentence-level sentiment analysis task of the "Build It, Break It" shared task of the First Workshop on Building Linguistically Generalizable NLP Systems. The goal of a builder is to provide an automated sentiment analyzer that would serve as a target for breakers whose goal is to find pairs of minimally-differing sentences that break the analyzer. inproceedings cho:2017:BLGNLP2017 Breaking Sentiment Analysis of Movie Reviews IevaStaliūnaite BenBonfil Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems September 2017

Copenhagen, Denmark

Association for Computational Linguistics 61–64 http://www.aclweb.org/anthology/W17-5410 The current paper covers several strategies we used to `break' predictions of sentiment analysis systems participating in the BLGNLP2017 workshop. Specifically, we identify difficulties of participating systems in understanding modals, subjective judgments, world-knowledge based references and certain differences in syntax and perspective. inproceedings staliunaite-bonfil:2017:BLGNLP2017