Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Heike Adel, Shuming Shi (Editors)

Anthology ID:
Online and Punta Cana, Dominican Republic
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Heike Adel | Shuming Shi

pdf bib
MiSS: An Assistant for Multi-Style Simultaneous Translation
Zuchao Li | Kevin Parnow | Masao Utiyama | Eiichiro Sumita | Hai Zhao

In this paper, we present MiSS, an assistant for multi-style simultaneous translation. Our proposed translation system has five key features: highly accurate translation, simultaneous translation, translation for multiple text styles, back-translation for translation quality evaluation, and grammatical error correction. With this system, we aim to provide a complete translation experience for machine translation users. Our design goals are high translation accuracy, real-time translation, flexibility, and measurable translation quality. Compared with the free commercial translation systems commonly used, our translation assistance system regards the machine translation application as a more complete and fully-featured tool for users. By incorporating additional features and giving the user better control over their experience, we improve translation efficiency and performance. Additionally, our assistant system combines machine translation, grammatical error correction, and interactive edits, and uses a crowdsourcing mode to collect more data for further training to improve both the machine translation and grammatical error correction models. A short video demonstrating our system is available at

pdf bib
Automatic Construction of Enterprise Knowledge Base
Junyi Chai | Yujie He | Homa Hashemi | Bing Li | Daraksha Parveen | Ranganath Kondapally | Wenjin Xu

In this paper, we present an automatic knowledge base construction system from large scale enterprise documents with minimal efforts of human intervention. In the design and deployment of such a knowledge mining system for enterprise, we faced several challenges including data distributional shift, performance evaluation, compliance requirements and other practical issues. We leveraged state-of-the-art deep learning models to extract information (named entities and definitions) at per document level, then further applied classical machine learning techniques to process global statistical information to improve the knowledge base. Experimental results are reported on actual enterprise documents. This system is currently serving as part of a Microsoft 365 service.

pdf bib
LightTag: Text Annotation Platform
Tal Perry

Text annotation tools assume that their user’s goal is to create a labeled corpus. However,users view annotation as a necessary evil on the way to deliver business value through NLP.Thus an annotation tool should optimize for the throughput of the global NLP process, not only the productivity of individual annotators. LightTag is a text annotation tool designed and built on that principle. This paper shares our design rationale, data modeling choices, and user interface decisions then illustrates how those choices serve the full NLP lifecycle.

pdf bib
TransIns: Document Translation with Markup Reinsertion
Jörg Steffen | Josef van Genabith

For many use cases, it is required that MT does not just translate raw text, but complex formatted documents (e.g. websites, slides, spreadsheets) and the result of the translation should reflect the formatting. This is challenging, as markup can be nested, apply to spans contiguous in source but non-contiguous in target etc. Here we present TransIns, a system for non-plain text document translation that builds on the Okapi framework and MT models trained with Marian NMT. We develop, implement and evaluate different strategies for reinserting markup into translated sentences using token alignments between source and target sentences. We propose a simple and effective strategy that compiles down all markup to single source tokens and transfers them to aligned target tokens. A first evaluation shows that this strategy yields highly accurate markup in the translated documents that outperforms the markup quality found in documents translated with popular translation services. We release TransIns under the MIT License as open-source software on An online demonstrator is available at

pdf bib
ET: A Workstation for Querying, Editing and Evaluating Annotated Corpora
Elvis de Souza | Cláudia Freitas

In this paper we explore the functionalities of ET, a suite designed to support linguistic research and natural language processing tasks using corpora annotated in the CoNLL-U format. These goals are achieved by two integrated environments – Interrogatório, an environment for querying and editing annotated corpora, and Julgamento, an environment for assessing their quality. ET is open-source, built on different Python Web technologies and has Web demonstrations available on-line. ET has been intensively used in our research group for over two years, being the chosen framework for several linguistic and NLP-related studies conducted by its researchers.

pdf bib
N-LTP: An Open-source Neural Language Technology Platform for Chinese
Wanxiang Che | Yunlong Feng | Libo Qin | Ting Liu

We introduce N-LTP, an open-source neural language technology platform supporting six fundamental Chinese NLP tasks: lexical analysis (Chinese word segmentation, part-of-speech tagging, and named entity recognition), syntactic parsing (dependency parsing), and semantic parsing (semantic dependency parsing and semantic role labeling). Unlike the existing state-of-the-art toolkits, such as Stanza, that adopt an independent model for each task, N-LTP adopts the multi-task framework by using a shared pre-trained model, which has the advantage of capturing the shared knowledge across relevant Chinese tasks. In addition, a knowledge distillation method (Clark et al., 2019) where the single-task model teaches the multi-task model is further introduced to encourage the multi-task model to surpass its single-task teacher. Finally, we provide a collection of easy-to-use APIs and a visualization tool to make users to use and view the processing results more easily and directly. To the best of our knowledge, this is the first toolkit to support six Chinese NLP fundamental tasks. Source code, documentation, and pre-trained models are available at

pdf bib
COMBO: State-of-the-Art Morphosyntactic Analysis
Mateusz Klimaszewski | Alina Wróblewska

We introduce COMBO – a fully neural NLP system for accurate part-of-speech tagging, morphological analysis, lemmatisation, and (enhanced) dependency parsing. It predicts categorical morphosyntactic features whilst also exposes their vector representations, extracted from hidden layers. COMBO is an easy to install Python package with automatically downloadable pre-trained models for over 40 languages. It maintains a balance between efficiency and quality. As it is an end-to-end system and its modules are jointly trained, its training is competitively fast. As its models are optimised for accuracy, they achieve often better prediction quality than SOTA. The COMBO library is available at:

pdf bib
ExcavatorCovid: Extracting Events and Relations from Text Corpora for Temporal and Causal Analysis for COVID-19
Bonan Min | Benjamin Rozonoyer | Haoling Qiu | Alexander Zamanian | Nianwen Xue | Jessica MacBride

Timely responses from policy makers to mitigate the impact of the COVID-19 pandemic rely on a comprehensive grasp of events, their causes, and their impacts. These events are reported at such a speed and scale as to be overwhelming. In this paper, we present ExcavatorCovid, a machine reading system that ingests open-source text documents (e.g., news and scientific publications), extracts COVID-19 related events and relations between them, and builds a Temporal and Causal Analysis Graph (TCAG). Excavator will help government agencies alleviate the information overload, understand likely downstream effects of political and economic decisions and events related to the pandemic, and respond in a timely manner to mitigate the impact of COVID-19. We expect the utility of Excavator to outlive the COVID-19 pandemic: analysts and decision makers will be empowered by Excavator to better understand and solve complex problems in the future. A demonstration video is available at

pdf bib
KOAS: Korean Text Offensiveness Analysis System
San-Hee Park | Kang-Min Kim | Seonhee Cho | Jun-Hyung Park | Hyuntae Park | Hyuna Kim | Seongwon Chung | SangKeun Lee

Warning: This manuscript contains a certain level of offensive expression. As communication through social media platforms has grown immensely, the increasing prevalence of offensive language online has become a critical problem. Notably in Korea, one of the countries with the highest Internet usage, automatic detection of offensive expressions has recently been brought to attention. However, morphological richness and complex syntax of Korean causes difficulties in neural model training. Furthermore, most of previous studies mainly focus on the detection of abusive language, disregarding implicit offensiveness and underestimating a different degree of intensity. To tackle these problems, we present KOAS, a system that fully exploits both contextual and linguistic features and estimates an offensiveness score for a text. We carefully designed KOAS with a multi-task learning framework and constructed a Korean dataset for offensive analysis from various domains. Refer for a detailed demonstration.

pdf bib
RepGraph: Visualising and Analysing Meaning Representation Graphs
Jaron Cohen | Roy Cohen | Edan Toledo | Jan Buys

We present RepGraph, an open source visualisation and analysis tool for meaning representation graphs. Graph-based meaning representations provide rich semantic annotations, but visualising them clearly is more challenging than for fully lexicalized representations. Our application provides a seamless, unifying interface with which to visualise, manipulate and analyse semantically parsed graph data represented in a JSON-based serialisation format. RepGraph visualises graphs in multiple formats, with an emphasis on showing the relation between nodes and their corresponding token spans, whilst keeping the representation compact. Additionally, the web-based tool provides NLP researchers with a clear, visually intuitive way of interacting with these graphs, and includes a number of graph analysis features. The tool currently supports the DMRS, EDS, PTG, UCCA, and AMR semantic frameworks. A live demo is available at

pdf bib
Thermostat: A Large Collection of NLP Model Explanations and Analysis Tools
Nils Feldhus | Robert Schwarzenberg | Sebastian Möller

In the language domain, as in other domains, neural explainability takes an ever more important role, with feature attribution methods on the forefront. Many such methods require considerable computational resources and expert knowledge about implementation details and parameter choices. To facilitate research, we present Thermostat which consists of a large collection of model explanations and accompanying analysis tools. Thermostat allows easy access to over 200k explanations for the decisions of prominent state-of-the-art models spanning across different NLP tasks, generated with multiple explainers. The dataset took over 10k GPU hours (> one year) to compile; compute time that the community now saves. The accompanying software tools allow to analyse explanations instance-wise but also accumulatively on corpus level. Users can investigate and compare models, datasets and explainers without the need to orchestrate implementation details. Thermostat is fully open source, democratizes explainability research in the language domain, circumvents redundant computations and increases comparability and replicability.

pdf bib
LMdiff: A Visual Diff Tool to Compare Language Models
Hendrik Strobelt | Benjamin Hoover | Arvind Satyanaryan | Sebastian Gehrmann

While different language models are ubiquitous in NLP, it is hard to contrast their outputs and identify which contexts one can handle better than the other. To address this question, we introduce LMdiff, a tool that visually compares probability distributions of two models that differ, e.g., through finetuning, distillation, or simply training with different parameter sizes. LMdiff allows the generation of hypotheses about model behavior by investigating text instances token by token and further assists in choosing these interesting text instances by identifying the most interesting phrases from large corpora. We showcase the applicability of LMdiff for hypothesis generation across multiple case studies. A demo is available at .

pdf bib
Semantic Context Path Labeling for Semantic Exploration of User Reviews
Salah Aït-Mokhtar | Caroline Brun | Yves Hoppenot | Agnes Sandor

In this paper we present a prototype demonstrator showcasing a novel method to perform semantic exploration of user reviews. The system enables effective navigation in a rich contextual semantic schema with a large number of structured classes indicating relevant information. In order to identify instances of the structured classes in the reviews, we defined a new Information Extraction task called Semantic Context Path (SCP) labeling, which simultaneously assigns types and semantic roles to entity mentions. Reviews can rapidly be explored based on the fine-grained and structured semantic classes. As a proof-of-concept, we have implemented this system for reviews on Points-of-Interest, in English and Korean.

pdf bib
Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking
Dirk Väth | Pascal Tilli | Ngoc Thang Vu

On the way towards general Visual Question Answering (VQA) systems that are able to answer arbitrary questions, the need arises for evaluation beyond single-metric leaderboards for specific datasets. To this end, we propose a browser-based benchmarking tool for researchers and challenge organizers, with an API for easy integration of new models and datasets to keep up with the fast-changing landscape of VQA. Our tool helps test generalization capabilities of models across multiple datasets, evaluating not just accuracy, but also performance in more realistic real-world scenarios such as robustness to input noise. Additionally, we include metrics that measure biases and uncertainty, to further explain model behavior. Interactive filtering facilitates discovery of problematic behavior, down to the data sample level. As proof of concept, we perform a case study on four models. We find that state-of-the-art VQA models are optimized for specific tasks or datasets, but fail to generalize even to other in-domain test sets, for example they can not recognize text in images. Our metrics allow us to quantify which image and question embeddings provide most robustness to a model. All code s publicly available.

pdf bib
Athena 2.0: Contextualized Dialogue Management for an Alexa Prize SocialBot
Juraj Juraska | Kevin Bowden | Lena Reed | Vrindavan Harrison | Wen Cui | Omkar Patil | Rishi Rajasekaran | Angela Ramirez | Cecilia Li | Eduardo Zamora | Phillip Lee | Jeshwanth Bheemanpally | Rohan Pandey | Adwait Ratnaparkhi | Marilyn Walker

Athena 2.0 is an Alexa Prize SocialBot that has been a finalist in the last two Alexa Prize Grand Challenges. One reason for Athena’s success is its novel dialogue management strategy, which allows it to dynamically construct dialogues and responses from component modules, leading to novel conversations with every interaction. Here we describe Athena’s system design and performance in the Alexa Prize during the 20/21 competition. A live demo of Athena as well as video recordings will provoke discussion on the state of the art in conversational AI.

pdf bib
SPRING Goes Online: End-to-End AMR Parsing and Generation
Rexhina Blloshmi | Michele Bevilacqua | Edoardo Fabiano | Valentina Caruso | Roberto Navigli

In this paper we present SPRING Online Services, a Web interface and RESTful APIs for our state-of-the-art AMR parsing and generation system, SPRING (Symmetric PaRsIng aNd Generation). The Web interface has been developed to be easily used by the Natural Language Processing community, as well as by the general public. It provides, among other things, a highly interactive visualization platform and a feedback mechanism to obtain user suggestions for further improvements of the system’s output. Moreover, our RESTful APIs enable easy integration of SPRING in downstream applications where AMR structures are needed. Finally, we make SPRING Online Services freely available at and, in addition, we release extra model checkpoints to be used with the original SPRING Python code.

pdf bib
fairseq Sˆ2: A Scalable and Integrable Speech Synthesis Toolkit
Changhan Wang | Wei-Ning Hsu | Yossi Adi | Adam Polyak | Ann Lee | Peng-Jen Chen | Jiatao Gu | Juan Pino

This paper presents fairseq Sˆ2, a fairseq extension for speech synthesis. We implement a number of autoregressive (AR) and non-AR text-to-speech models, and their multi-speaker variants. To enable training speech synthesis models with less curated data, a number of preprocessing tools are built and their importance is shown empirically. To facilitate faster iteration of development and analysis, a suite of automatic metrics is included. Apart from the features added specifically for this extension, fairseq Sˆ2 also benefits from the scalability offered by fairseq and can be easily integrated with other state-of-the-art systems provided in this framework. The code, documentation, and pre-trained models will be made available at

pdf bib
Press Freedom Monitor: Detection of Reported Press and Media Freedom Violations in Twitter and News Articles
Tariq Yousef | Antje Schlaf | Janos Borst | Andreas Niekler | Gerhard Heyer

Freedom of the press and media is of vital importance for democratically organised states and open societies. We introduce the Press Freedom Monitor, a tool that aims to detect reported press and media freedom violations in news articles and tweets. It is used by press and media freedom organisations to support their daily monitoring and to trigger rapid response actions. The Press Freedom Monitor enables the monitoring experts to get a fast overview over recently reported incidents and it has shown an impressive performance in this regard. This paper presents our work on the tool, starting with the training phase, which comprises defining the topic-related keywords to be used for querying APIs for news and Twitter content and evaluating different machine learning models based on a training dataset specifically created for our use case. Then, we describe the components of the production pipeline, including data gathering, duplicates removal, country mapping, case mapping and the user interface. We also conducted a usability study to evaluate the effectiveness of the user interface, and describe improvement plans for future work.

pdf bib
UMR-Writer: A Web Application for Annotating Uniform Meaning Representations
Jin Zhao | Nianwen Xue | Jens Van Gysel | Jinho D. Choi

We present UMR-Writer, a web-based application for annotating Uniform Meaning Representations (UMR), a graph-based, cross-linguistically applicable semantic representation developed recently to support the development of interpretable natural language applications that require deep semantic analysis of texts. We present the functionalities of UMR-Writer and discuss the challenges in developing such a tool and how they are addressed.

pdf bib
TranslateLocally: Blazing-fast translation running on the local CPU
Nikolay Bogoychev | Jelmer Van der Linde | Kenneth Heafield

Every day, millions of people sacrifice their privacy and browsing habits in exchange for online machine translation. Companies and governments with confidentiality requirements often ban online translation or pay a premium to disable logging. To bring control back to the end user and demonstrate speed, we developed translateLocally. Running locally on a desktop or laptop CPU, translateLocally delivers cloud-like translation speed and quality even on 10 year old hardware. The open-source software is based on Marian and runs on Linux, Windows, and macOS.

pdf bib
Datasets: A Community Library for Natural Language Processing
Quentin Lhoest | Albert Villanova del Moral | Yacine Jernite | Abhishek Thakur | Patrick von Platen | Suraj Patil | Julien Chaumond | Mariama Drame | Julien Plu | Lewis Tunstall | Joe Davison | Mario Šaško | Gunjan Chhablani | Bhavitvya Malik | Simon Brandeis | Teven Le Scao | Victor Sanh | Canwen Xu | Nicolas Patry | Angelina McMillan-Major | Philipp Schmid | Sylvain Gugger | Clément Delangue | Théo Matussière | Lysandre Debut | Stas Bekman | Pierric Cistac | Thibault Goehringer | Victor Mustar | François Lagunas | Alexander Rush | Thomas Wolf

The scale, variety, and quantity of publicly-available NLP datasets has grown rapidly as researchers propose new tasks, larger models, and novel benchmarks. Datasets is a community library for contemporary NLP designed to support this ecosystem. Datasets aims to standardize end-user interfaces, versioning, and documentation, while providing a lightweight front-end that behaves similarly for small datasets as for internet-scale corpora. The design of the library incorporates a distributed, community-driven approach to adding datasets and documenting usage. After a year of development, the library now includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects and shared tasks. The library is available at

pdf bib
Summary Explorer: Visualizing the State of the Art in Text Summarization
Shahbaz Syed | Tariq Yousef | Khalid Al Khatib | Stefan Jänicke | Martin Potthast

This paper introduces Summary Explorer, a new tool to support the manual inspection of text summarization systems by compiling the outputs of 55 state-of-the-art single document summarization approaches on three benchmark datasets, and visually exploring them during a qualitative assessment. The underlying design of the tool considers three well-known summary quality criteria (coverage, faithfulness, and position bias), encapsulated in a guided assessment based on tailored visualizations. The tool complements existing approaches for locally debugging summarization models and improves upon them. The tool is available at

pdf bib
MeetDot: Videoconferencing with Live Translation Captions
Arkady Arkhangorodsky | Christopher Chu | Scot Fang | Yiqi Huang | Denglin Jiang | Ajay Nagesh | Boliang Zhang | Kevin Knight

We present MeetDot, a videoconferencing system with live translation captions overlaid on screen. The system aims to facilitate conversation between people who speak different languages, thereby reducing communication barriers between multilingual participants. Currently, our system supports speech and captions in 4 languages and combines automatic speech recognition (ASR) and machine translation (MT) in a cascade. We use the re-translation strategy to translate the streamed speech, resulting in caption flicker. Additionally, our system has very strict latency requirements to have acceptable call quality. We implement several features to enhance user experience and reduce their cognitive load, such as smooth scrolling captions and reducing caption flicker. The modular architecture allows us to integrate different ASR and MT services in our backend. Our system provides an integrated evaluation suite to optimize key intrinsic evaluation metrics such as accuracy, latency and erasure. Finally, we present an innovative cross-lingual word-guessing game as an extrinsic evaluation metric to measure end-to-end system performance. We plan to make our system open-source for research purposes.

pdf bib
Box Embeddings: An open-source library for representation learning using geometric structures
Tejas Chheda | Purujit Goyal | Trang Tran | Dhruvesh Patel | Michael Boratko | Shib Sankar Dasgupta | Andrew McCallum

A fundamental component to the success of modern representation learning is the ease of performing various vector operations. Recently, objects with more geometric structure (eg. distributions, complex or hyperbolic vectors, or regions such as cones, disks, or boxes) have been explored for their alternative inductive biases and additional representational capacity. In this work, we introduce Box Embeddings, a Python library that enables researchers to easily apply and extend probabilistic box embeddings. Fundamental geometric operations on boxes are implemented in a numerically stable way, as are modern approaches to training boxes which mitigate gradient sparsity. The library is fully open source, and compatible with both PyTorch and TensorFlow, which allows existing neural network layers to be replaced with or transformed into boxes easily. In this work, we present the implementation details of the fundamental components of the library, and the concepts required to use box representations alongside existing neural network architectures.

pdf bib
LexiClean: An annotation tool for rapid multi-task lexical normalisation
Tyler Bikaun | Tim French | Melinda Hodkiewicz | Michael Stewart | Wei Liu

NLP systems are often challenged by difficulties arising from noisy, non-standard, and domain specific corpora. The task of lexical normalisation aims to standardise such corpora, but currently lacks suitable tools to acquire high-quality annotated data to support deep learning based approaches. In this paper, we present LexiClean, the first open-source web-based annotation tool for multi-task lexical normalisation. LexiClean’s main contribution is support for simultaneous in situ token-level modification and annotation that can be rapidly applied corpus wide. We demonstrate the usefulness of our tool through a case study on two sets of noisy corpora derived from the specialised-domain of industrial mining. We show that LexiClean allows for the rapid and efficient development of high-quality parallel corpora. A demo of our system is available at:

pdf bib
T3-Vis: visual analytic for Training and fine-Tuning Transformers in NLP
Raymond Li | Wen Xiao | Lanjun Wang | Hyeju Jang | Giuseppe Carenini

Transformers are the dominant architecture in NLP, but their training and fine-tuning is still very challenging. In this paper, we present the design and implementation of a visual analytic framework for assisting researchers in such process, by providing them with valuable insights about the model’s intrinsic properties and behaviours. Our framework offers an intuitive overview that allows the user to explore different facets of the model (e.g., hidden states, attention) through interactive visualization, and allows a suite of built-in algorithms that compute the importance of model components and different parts of the input sequence. Case studies and feedback from a user focus group indicate that the framework is useful, and suggest several improvements. Our framework is available at:

pdf bib
DomiKnowS: A Library for Integration of Symbolic Domain Knowledge in Deep Learning
Hossein Rajaby Faghihi | Quan Guo | Andrzej Uszok | Aliakbar Nafar | Parisa Kordjamshidi

We demonstrate a library for the integration of domain knowledge in deep learning architectures. Using this library, the structure of the data is expressed symbolically via graph declarations and the logical constraints over outputs or latent variables can be seamlessly added to the deep models. The domain knowledge can be defined explicitly, which improves the explainability of the models in addition to their performance and generalizability in the low-data regime. Several approaches for such integration of symbolic and sub-symbolic models have been introduced; however, there is no library to facilitate the programming for such integration in a generic way while various underlying algorithms can be used. Our library aims to simplify programming for such integration in both training and inference phases while separating the knowledge representation from learning algorithms. We showcase various NLP benchmark tasks and beyond. The framework is publicly available at Github(

pdf bib
OpenFraming: Open-sourced Tool for Computational Framing Analysis of Multilingual Data
Vibhu Bhatia | Vidya Prasad Akavoor | Sejin Paik | Lei Guo | Mona Jalal | Alyssa Smith | David Assefa Tofu | Edward Edberg Halim | Yimeng Sun | Margrit Betke | Prakash Ishwar | Derry Tanti Wijaya

When journalists cover a news story, they can cover the story from multiple angles or perspectives. These perspectives are called “frames,” and usage of one frame or another may influence public perception and opinion of the issue at hand. We develop a web-based system for analyzing frames in multilingual text documents. We propose and guide users through a five-step end-to-end computational framing analysis framework grounded in media framing theory in communication research. Users can use the framework to analyze multilingual text data, starting from the exploration of frames in user’s corpora and through review of previous framing literature (step 1-3) to frame classification (step 4) and prediction (step 5). The framework combines unsupervised and supervised machine learning and leverages a state-of-the-art (SoTA) multilingual language model, which can significantly enhance frame prediction performance while requiring a considerably small sample of manual annotations. Through the interactive website, anyone can perform the proposed computational framing analysis, making advanced computational analysis available to researchers without a programming background and bridging the digital divide within the communication research discipline in particular and the academic community in general. The system is available online at, via an API, or through our GitHub page

pdf bib
IrEne-viz: Visualizing Energy Consumption of Transformer Models
Yash Kumar Lal | Reetu Singh | Harsh Trivedi | Qingqing Cao | Aruna Balasubramanian | Niranjan Balasubramanian

IrEne is an energy prediction system that accurately predicts the interpretable inference energy consumption of a wide range of Transformer-based NLP models. We present the IrEne-viz tool, an online platform for visualizing and exploring energy consumption of various Transformer-based models easily. Additionally, we release a public API that can be used to access granular information about energy consumption of transformer models and their components. The live demo is available at

pdf bib
Open-Domain Question-Answering for COVID-19 and Other Emergent Domains
Sharon Levy | Kevin Mo | Wenhan Xiong | William Yang Wang

Since late 2019, COVID-19 has quickly emerged as the newest biomedical domain, resulting in a surge of new information. As with other emergent domains, the discussion surrounding the topic has been rapidly changing, leading to the spread of misinformation. This has created the need for a public space for users to ask questions and receive credible, scientific answers. To fulfill this need, we turn to the task of open-domain question-answering, which we can use to efficiently find answers to free-text questions from a large set of documents. In this work, we present such a system for the emergent domain of COVID-19. Despite the small data size available, we are able to successfully train the system to retrieve answers from a large-scale corpus of published COVID-19 scientific papers. Furthermore, we incorporate effective re-ranking and question-answering techniques, such as document diversity and multiple answer spans. Our open-domain question-answering system can further act as a model for the quick development of similar systems that can be adapted and modified for other developing emergent domains.

pdf bib
Project Debater APIs: Decomposing the AI Grand Challenge
Roy Bar-Haim | Yoav Kantor | Elad Venezian | Yoav Katz | Noam Slonim

Project Debater was revealed in 2019 as the first AI system that can debate human experts on complex topics. Engaging in a live debate requires a diverse set of skills, and Project Debater has been developed accordingly as a collection of components, each designed to perform a specific subtask. Project Debater APIs provide access to many of these capabilities, as well as to more recently developed ones. This diverse set of web services, publicly available for academic use, includes core NLP services, argument mining and analysis capabilities, and higher-level services for content summarization. We describe these APIs and their performance, and demonstrate how they can be used for building practical solutions. In particular, we will focus on Key Point Analysis, a novel technology that identifies the main points and their prevalence in a collection of texts such as survey responses and user reviews.

pdf bib
CroAno : A Crowd Annotation Platform for Improving Label Consistency of Chinese NER Dataset
Baoli Zhang | Zhucong Li | Zhen Gan | Yubo Chen | Jing Wan | Kang Liu | Jun Zhao | Shengping Liu | Yafei Shi

In this paper, we introduce CroAno, a web-based crowd annotation platform for the Chinese named entity recognition (NER). Besides some basic features for crowd annotation like fast tagging and data management, CroAno provides a systematic solution for improving label consistency of Chinese NER dataset. 1) Disagreement Adjudicator: CroAno uses a multi-dimensional highlight mode to visualize instance-level inconsistent entities and makes the revision process user-friendly. 2) Inconsistency Detector: CroAno employs a detector to locate corpus-level label inconsistency and provides users an interface to correct inconsistent entities in batches. 3) Prediction Error Analyzer: We deconstruct the entity prediction error of the model to six fine-grained entity error types. Users can employ this error system to detect corpus-level inconsistency from a model perspective. To validate the effectiveness of our platform, we use CroAno to revise two public datasets. In the two revised datasets, we get an improvement of +1.96% and +2.57% F1 respectively in model performance.

pdf bib
iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration
Eran Hirsch | Alon Eirew | Ori Shapira | Avi Caciularu | Arie Cattan | Ori Ernst | Ramakanth Pasunuru | Hadar Ronen | Mohit Bansal | Ido Dagan

We introduce iFᴀᴄᴇᴛSᴜᴍ, a web application for exploring topical document collections. iFᴀᴄᴇᴛSᴜᴍ integrates interactive summarization together with faceted search, by providing a novel faceted navigation scheme that yields abstractive summaries for the user’s selections. This approach offers both a comprehensive overview as well as particular details regard-ing subtopics of choice. The facets are automatically produced based on cross-document coreference pipelines, rendering generic concepts, entities and statements surfacing in the source texts. We analyze the effectiveness of our application through small-scale user studies that suggest the usefulness of our tool.

pdf bib
AMuSE-WSD: An All-in-one Multilingual System for Easy Word Sense Disambiguation
Riccardo Orlando | Simone Conia | Fabrizio Brignone | Francesco Cecconi | Roberto Navigli

Over the past few years, Word Sense Disambiguation (WSD) has received renewed interest: recently proposed systems have shown the remarkable effectiveness of deep learning techniques in this task, especially when aided by modern pretrained language models. Unfortunately, such systems are still not available as ready-to-use end-to-end packages, making it difficult for researchers to take advantage of their performance. The only alternative for a user interested in applying WSD to downstream tasks is to rely on currently available end-to-end WSD systems, which, however, still rely on graph-based heuristics or non-neural machine learning algorithms. In this paper, we fill this gap and propose AMuSE-WSD, the first end-to-end system to offer high-quality sense information in 40 languages through a state-of-the-art neural model for WSD. We hope that AMuSE-WSD will provide a stepping stone for the integration of meaning into real-world applications and encourage further studies in lexical semantics. AMuSE-WSD is available online at

pdf bib
SeqAttack: On Adversarial Attacks for Named Entity Recognition
Walter Simoncini | Gerasimos Spanakis

Named Entity Recognition is a fundamental task in information extraction and is an essential element for various Natural Language Processing pipelines. Adversarial attacks have been shown to greatly affect the performance of text classification systems but knowledge about their effectiveness against named entity recognition models is limited. This paper investigates the effectiveness and portability of adversarial attacks from text classification to named entity recognition and the ability of adversarial training to counteract these attacks. We find that character-level and word-level attacks are the most effective, but adversarial training can grant significant protection at little to no expense of standard performance. Alongside our results, we also release SeqAttack, a framework to conduct adversarial attacks against token classification models (used in this work for named entity recognition) and a companion web application to inspect and cherry pick adversarial examples.

pdf bib
InVeRo-XL: Making Cross-Lingual Semantic Role Labeling Accessible with Intelligible Verbs and Roles
Simone Conia | Riccardo Orlando | Fabrizio Brignone | Francesco Cecconi | Roberto Navigli

Notwithstanding the growing interest in cross-lingual techniques for Natural Language Processing, there has been a surprisingly small number of efforts aimed at the development of easy-to-use tools for cross-lingual Semantic Role Labeling. In this paper, we fill this gap and present InVeRo-XL, an off-the-shelf state-of-the-art system capable of annotating text with predicate sense and semantic role labels from 7 predicate-argument structure inventories in more than 40 languages. We hope that our system – with its easy-to-use RESTful API and Web interface – will become a valuable tool for the research community, encouraging the integration of sentence-level semantics into cross-lingual downstream tasks. InVeRo-XL is available online at

pdf bib
SummerTime: Text Summarization Toolkit for Non-experts
Ansong Ni | Zhangir Azerbayev | Mutethia Mutuma | Troy Feng | Yusen Zhang | Tao Yu | Ahmed Hassan Awadallah | Dragomir Radev

Recent advances in summarization provide models that can generate summaries of higher quality. Such models now exist for a number of summarization tasks, including query-based summarization, dialogue summarization, and multi-document summarization. While such models and tasks are rapidly growing in the research field, it has also become challenging for non-experts to keep track of them. To make summarization methods more accessible to a wider audience, we develop SummerTime by rethinking the summarization task from the perspective of an NLP non-expert. SummerTime is a complete toolkit for text summarization, including various models, datasets, and evaluation metrics, for a full spectrum of summarization-related tasks. SummerTime integrates with libraries designed for NLP researchers, and enables users with easy-to-use APIs. With SummerTime, users can locate pipeline solutions and search for the best model with their own data, and visualize the differences, all with a few lines of code. We also provide explanations for models and evaluation metrics to help users understand the model behaviors and select models that best suit their needs. Our library, along with a notebook demo, is available at

pdf bib
Chandler: An Explainable Sarcastic Response Generator
Silviu Oprea | Steven Wilson | Walid Magdy

We introduce Chandler, a system that generates sarcastic responses to a given utterance. Previous sarcasm generators assume the intended meaning that sarcasm conceals is the opposite of the literal meaning. We argue that this traditional theory of sarcasm provides a grounding that is neither necessary, nor sufficient, for sarcasm to occur. Instead, we ground our generation process on a formal theory that specifies conditions that unambiguously differentiate sarcasm from non-sarcasm. Furthermore, Chandler not only generates sarcastic responses, but also explanations for why each response is sarcastic. This provides accountability, crucial for avoiding miscommunication between humans and conversational agents, particularly considering that sarcastic communication can be offensive. In human evaluation, Chandler achieves comparable or higher sarcasm scores, compared to state-of-the-art generators, while generating more diverse responses, that are more specific and more coherent to the input.

pdf bib
TabPert : An Effective Platform for Tabular Perturbation
Nupur Jain | Vivek Gupta | Anshul Rai | Gaurav Kumar

To grasp the true reasoning ability, the Natural Language Inference model should be evaluated on counterfactual data. TabPert facilitates this by generation of such counterfactual data for assessing model tabular reasoning issues. TabPert allows the user to update a table, change the hypothesis, change the labels, and highlight rows that are important for hypothesis classification. TabPert also details the technique used to automatically produce the table, as well as the strategies employed to generate the challenging hypothesis. These counterfactual tables and hypotheses, as well as the metadata, is then used to explore the existing model’s shortcomings methodically and quantitatively.

pdf bib
DRIFT: A Toolkit for Diachronic Analysis of Scientific Literature
Abheesht Sharma | Gunjan Chhablani | Harshit Pandey | Rajaswa Patil

In this work, we present to the NLP community, and to the wider research community as a whole, an application for the diachronic analysis of research corpora. We open source an easy-to-use tool coined DRIFT, which allows researchers to track research trends and development over the years. The analysis methods are collated from well-cited research works, with a few of our own methods added for good measure. Succinctly put, some of the analysis methods are: keyword extraction, word clouds, predicting declining/stagnant/growing trends using Productivity, tracking bi-grams using Acceleration plots, finding the Semantic Drift of words, tracking trends using similarity, etc. To demonstrate the utility and efficacy of our tool, we perform a case study on the cs.CL corpus of the arXiv repository and draw inferences from the analysis methods. The toolkit and the associated code are available here:

pdf bib
FAST: Fast Annotation tool for SmarT devices
Shunyo Kawamoto | Yu Sawai | Kohei Wakimoto | Peinan Zhang

Working with a wide range of annotators with the same attributes is crucial, as in real-world applications. Although such application cases often use crowd-sourcing mechanisms to gather a variety of annotators, most real-world users use mobile devices. In this paper, we propose “FAST,” an annotation tool for application tasks that focuses on the user experience of mobile devices, which has not yet been focused on thus far. We designed FAST as a web application for use on any device with a flexible interface that can be customized to fit various tasks. In our experiments, we conducted crowd-sourced annotation for a sentiment analysis task with several annotators and evaluated annotation metrics such as speed, quality, and ease of use from the tool’s logs and user surveys. Based on the results of our experiments, we conclude that our system can annotate faster than existing methods while maintaining the annotation quality.

pdf bib
deepQuest-py: Large and Distilled Models for Quality Estimation
Fernando Alva-Manchego | Abiola Obamuyide | Amit Gajbhiye | Frédéric Blain | Marina Fomicheva | Lucia Specia

We introduce deepQuest-py, a framework for training and evaluation of large and light-weight models for Quality Estimation (QE). deepQuest-py provides access to (1) state-of-the-art models based on pre-trained Transformers for sentence-level and word-level QE; (2) light-weight and efficient sentence-level models implemented via knowledge distillation; and (3) a web interface for testing models and visualising their predictions. deepQuest-py is available at under a CC BY-NC-SA licence.