Systems like Voice-command based conversational agents are characterized by a pre-defined set of skills or intents to perform user specified tasks. In the course of time, newer intents may emerge requiring retraining. However, the newer intents may not be explicitly announced and need to be inferred dynamically. Thus, there are two important tasks at hand (a). identifying emerging new intents, (b). annotating data of the new intents so that the underlying classifier can be retrained efficiently. The tasks become specially challenging when a large number of new intents emerge simultaneously and there is a limited budget of manual annotation. In this paper, we propose MNID (Multiple Novel Intent Detection) which is a cluster based framework to detect multiple novel intents with budgeted human annotation cost. Empirical results on various benchmark datasets (of different sizes) demonstrate that MNID, by intelligently using the budget for annotation, outperforms the baseline methods in terms of accuracy and F1-score.
Despite tremendous progress in automatic summarization, state-of-the-art methods are predominantly trained to excel in summarizing short newswire articles, or documents with strong layout biases such as scientific articles or government reports. Efficient techniques to summarize financial documents, discussing facts and figures, have largely been unexplored, majorly due to the unavailability of suitable datasets. In this work, we present ECTSum, a new dataset with transcripts of earnings calls (ECTs), hosted by publicly traded companies, as documents, and experts-written short telegram-style bullet point summaries derived from corresponding Reuters articles. ECTs are long unstructured documents without any prescribed length limit or format. We benchmark our dataset with state-of-the-art summarization methods across various metrics evaluating the content quality and factual consistency of the generated summaries. Finally, we present a simple yet effective approach, ECT-BPS, to generate a set of bullet points that precisely capture the important facts discussed in the calls.
Relation classification (sometimes called ‘extraction’) requires trustworthy datasets for fine-tuning large language models, as well as for evaluation. Data collection is challenging for Indian languages, because they are syntactically and morphologically diverse, as well as different from resource-rich languages like English. Despite recent interest in deep generative models for Indian languages, relation classification is still not well-served by public data sets. In response, we present IndoRE, a dataset with 39K entity- and relation-tagged gold sentences in three Indian languages, plus English. We start with a multilingual BERT (mBERT) based system that captures entity span positions and type information and provides competitive monolingual relation classification. Using this system, we explore and compare transfer mechanisms between languages. In particular, we study the accuracy-efficiency tradeoff between expensive gold instances vs. translated and aligned ‘silver’ instances.
Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark datasets. In this paper, we meticulously create a large amount of data connected with E-manuals and develop a suitable algorithm to exploit it. We collect E-Manual Corpus, a huge corpus of 307,957 E-manuals, and pretrain RoBERTa on this large corpus. We create various benchmark QA datasets which include question answer pairs curated by experts based upon two E-manuals, real user questions from Community Question Answering Forum pertaining to E-manuals etc. We introduce EMQAP (E-Manual Question Answering Pipeline) that answers questions pertaining to electronics devices. Built upon the pretrained RoBERTa, it harbors a supervised multi-task learning framework which efficiently performs the dual tasks of identifying the section in the E-manual where the answer can be found and the exact answer span within that section. For E-Manual annotated question-answer pairs, we show an improvement of about 40% in ROUGE-L F1 scores over most competitive baseline. We perform a detailed ablation study and establish the versatility of EMQAP across different circumstances. The code and datasets are shared at https://github.com/abhi1nandy2/EMNLP-2021-Findings, and the corresponding project website is https://sites.google.com/view/emanualqa/home.
In this digital age, online users expect personalized content. To cater to diverse group of audiences across online platforms it is necessary to generate multiple variants of same content with differing degree of characteristics (sentiment, style, formality, etc.). Though text-style transfer is a well explored related area, it focuses on flipping the style attribute polarity instead of regulating a fine-grained attribute transfer. In this paper we propose a hierarchical architecture for finer control over the at- tribute, preserving content using attribute dis- entanglement. We demonstrate the effective- ness of the generative process for two different attributes with varied complexity, namely sentiment and formality. With extensive experiments and human evaluation on five real-world datasets, we show that the framework can generate natural looking sentences with finer degree of control of intensity of a given attribute.
The stance detection task aims at detecting the stance of a tweet or a text for a target. These targets can be named entities or free-form sentences (claims). Though the task involves reasoning of the tweet with respect to a target, we find that it is possible to achieve high accuracy on several publicly available Twitter stance detection datasets without looking at the target sentence. Specifically, a simple tweet classification model achieved human-level performance on the WT–WT dataset and more than two-third accuracy on various other datasets. We investigate the existence of biases in such datasets to find the potential spurious correlations of sentiment-stance relations and lexical choice associated with the stance category. Furthermore, we propose a new large dataset free of such biases and demonstrate its aptness on the existing stance detection systems. Our empirical findings show much scope for research on the stance detection task and proposes several considerations for creating future stance detection datasets.
The recently released FEVER dataset provided benchmark results on a fact-checking task in which given a factual claim, the system must extract textual evidence (sets of sentences from Wikipedia pages) that support or refute the claim. In this paper, we present a completely task-agnostic pipelined system, AttentiveChecker, consisting of three homogeneous Bi-Directional Attention Flow (BIDAF) networks, which are multi-layer hierarchical networks that represent the context at different levels of granularity. We are the first to apply to this task a bi-directional attention flow mechanism to obtain a query-aware context representation without early summarization. AttentiveChecker can be used to perform document retrieval, sentence selection, and claim verification. Experiments on the FEVER dataset indicate that AttentiveChecker is able to achieve the state-of-the-art results on the FEVER test set.
Multilingual writers and speakers often alternate between two languages in a single discourse. This practice is called “code-switching”. Existing sentiment detection methods are usually trained on sentiment-labeled monolingual text. Manually labeled code-switched text, especially involving minority languages, is extremely rare. Consequently, the best monolingual methods perform relatively poorly on code-switched text. We present an effective technique for synthesizing labeled code-switched text from labeled monolingual text, which is relatively readily available. The idea is to replace carefully selected subtrees of constituency parses of sentences in the resource-rich language with suitable token spans selected from automatic translations to the resource-poor language. By augmenting the scarce labeled code-switched text with plentiful synthetic labeled code-switched text, we achieve significant improvements in sentiment labeling accuracy (1.5%, 5.11% 7.20%) for three different language pairs (English-Hindi, English-Spanish and English-Bengali). The improvement is even significant in hatespeech detection whereby we achieve a 4% improvement using only synthetic code-switched data (6% with data augmentation).
Recently, biomedical version of embeddings obtained from language models such as BioELMo have shown state-of-the-art results for the textual inference task in the medical domain. In this paper, we explore how to incorporate structured domain knowledge, available in the form of a knowledge graph (UMLS), for the Medical NLI task. Specifically, we experiment with fusing embeddings obtained from knowledge graph with the state-of-the-art approaches for NLI task (ESIM model). We also experiment with fusing the domain-specific sentiment information for the task. Experiments conducted on MedNLI dataset clearly show that this strategy improves the baseline BioELMo architecture for the Medical NLI task.
Code-Switching (CS) between two languages is extremely common in communities with societal multilingualism where speakers switch between two or more languages when interacting with each other. CS has been extensively studied in spoken language by linguists for several decades but with the popularity of social-media and less formal Computer Mediated Communication, we now see a big rise in the use of CS in the text form. This poses interesting challenges and a need for computational processing of such code-switched data. As with any Computational Linguistic analysis and Natural Language Processing tools and applications, we need annotated data for understanding, processing, and generation of code-switched language. In this study, we focus on CS between English and Hindi Tweets extracted from the Twitter stream of Hindi-English bilinguals. We present an annotation scheme for annotating the pragmatic functions of CS in Hindi-English (Hi-En) code-switched tweets based on a linguistic analysis and some initial experiments.
This paper proposes a graphical framework to extract opinionated sentences which highlight different contexts within a given news article by introducing the concept of diversity in a graphical model for opinion detection.We conduct extensive evaluations and find that the proposed modification leads to impressive improvement in performance and makes the final results of the model much more usable. The proposed method (OP-D) not only performs much better than the other techniques used for opinion detection as well as introducing diversity, but is also able to select opinions from different categories (Asher et al. 2009). By developing a classification model which categorizes the identified sentences into various opinion categories, we find that OP-D is able to push opinions from different categories uniformly among the top opinions.
We present a study of the word interaction networks of Bengali in the framework of complex networks. The topological properties of these networks reveal interesting insights into the morpho-syntax of the language, whereas clustering helps in the induction of the natural word classes leading to a principled way of designing POS tagsets. We compare different network construction techniques and clustering algorithms based on the cohesiveness of the word clusters. Cohesiveness is measured against two gold-standard tagsets by means of the novel metric of tag-entropy. The approach presented here is a generic one that can be easily extended to any language.