Abdalghani Abujabal

2023

Novel Feature Discovery for Task-Oriented Dialog Systems
Vinh Thinh Ho | Mohamed Soliman | Abdalghani Abujabal
Findings of the Association for Computational Linguistics: EACL 2023

A novel feature represents a cluster of semantically equivalent novel user requests e.g., requests to play a song on a service or read user’s messages. Detecting and supporting novel features is crucial towards wider adoption of dialog systems by end users. Intuitively, features are represented by a combination of intents, slot types and/or their values. For example, while playing a song is a feature represented by a single intent (PlayMusic) only, playing a song on a service is another feature represented by the combination of PlayMusic intent and ServiceName slot type. Prior work on novelty detection limits the scope of features to those represented by novel single intents, leading to (1) giant clusters spanning several user-perceived fine-grained features belonging to the same intent, (2) incoherent interpretation of clusters from users’ perspective (no direct connection to some user-perceived feature), and (3) missing those features spanning several intents. In this work, we introduce feature discovery as opposed to single intent discovery, which aims at discovering novel features spanning a combination of intents and slots, and present a technique for discovering novel features from user utterances. Experiments on two datasets demonstrate the effectiveness of our approach and consistently show its ability to detect novel features.

2021

pdf bib abs

Identifying and Resolving Annotation Changes for Natural Language Understanding
Jose Garrido Ramas | Giorgio Pessot | Abdalghani Abujabal | Martin Rajman
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

Annotation conflict resolution is crucial towards building machine learning models with acceptable performance. Past work on annotation conflict resolution had assumed that data is collected at once, with a fixed set of annotators and fixed annotation guidelines. Moreover, previous work dealt with atomic labeling tasks. In this paper, we address annotation conflict resolution for Natural Language Understanding (NLU), a structured prediction task, in a real-world setting of commercial voice-controlled personal assistants, where (1) regular data collections are needed to support new and existing functionalities, (2) annotation guidelines evolve over time, and (3) the pool of annotators change across data collections. We devise an approach combining information-theoretic measures and a supervised neural model to resolve conflicts in data annotation. We evaluate our approach both intrinsically and extrinsically on a real-world dataset with 3.5M utterances of a commercial dialog system in German. Our approach leads to dramatic improvements over a majority baseline especially in contentious cases. On the NLU task, our approach achieves 2.75% error reduction over a no-resolution baseline.

pdf bib abs

Continuous Model Improvement for Language Understanding with Machine Translation
Abdalghani Abujabal | Claudio Delli Bovi | Sungho Ryu | Turan Gojayev | Fabian Triefenbach | Yannick Versley
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

Scaling conversational personal assistants to a multitude of languages puts high demands on collecting and labelling data, a setting in which cross-lingual learning techniques can help to reconcile the need for well-performing Natural Language Understanding (NLU) with a desideratum to support many languages without incurring unacceptable cost. In this work, we show that automatically annotating unlabeled utterances using Machine Translation in an offline fashion and adding them to the training data can improve performance for existing NLU features for low-resource languages, where a straightforward translate-test approach as considered in existing literature would fail the latency requirements of a live environment. We demonstrate the effectiveness of our method with intrinsic and extrinsic evaluation using a real-world commercial dialog system in German. Beyond an intrinsic evaluation, where 56% of the resulting automatically labeled utterances had a perfect match with ground-truth labels, we see significant performance improvements in an extrinsic evaluation settings when manual labeled data is available in small quantities.

2019

pdf bib abs

ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters
Abdalghani Abujabal | Rishiraj Saha Roy | Mohamed Yahya | Gerhard Weikum
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

To bridge the gap between the capabilities of the state-of-the-art in factoid question answering (QA) and what users ask, we need large datasets of real user questions that capture the various question phenomena users are interested in, and the diverse ways in which these questions are formulated. We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as compositionality, temporal reasoning, and comparisons. ComQA questions come from the WikiAnswers community QA platform, which typically contains questions that are not satisfactorily answerable by existing search engine technology. Through a large crowdsourcing effort, we clean the question dataset, group questions into paraphrase clusters, and annotate clusters with their answers. ComQA contains 11,214 questions grouped into 4,834 paraphrase clusters. We detail the process of constructing ComQA, including the measures taken to ensure its high quality while making effective use of crowdsourcing. We also present an extensive analysis of the dataset and the results achieved by state-of-the-art systems on ComQA, demonstrating that our dataset can be a driver of future research on QA.

2017

pdf bib abs

QUINT: Interpretable Question Answering over Knowledge Bases
Abdalghani Abujabal | Rishiraj Saha Roy | Mohamed Yahya | Gerhard Weikum
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present QUINT, a live system for question answering over knowledge bases. QUINT automatically learns role-aligned utterance-query templates from user questions paired with their answers. When QUINT answers a question, it visualizes the complete derivation sequence from the natural language utterance to the final answer. The derivation provides an explanation of how the syntactic structure of the question was used to derive the structure of a SPARQL query, and how the phrases in the question were used to instantiate different parts of the query. When an answer seems unsatisfactory, the derivation provides valuable insights towards reformulating the question.

pdf bib abs

Efficiency-aware Answering of Compositional Questions using Answer Type Prediction
David Ziegler | Abdalghani Abujabal | Rishiraj Saha Roy | Gerhard Weikum
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

This paper investigates the problem of answering compositional factoid questions over knowledge bases (KB) under efficiency constraints. The method, called TIPI, (i) decomposes compositional questions, (ii) predicts answer types for individual sub-questions, (iii) reasons over the compatibility of joint types, and finally, (iv) formulates compositional SPARQL queries respecting type constraints. TIPI’s answer type predictor is trained using distant supervision, and exploits lexical, syntactic and embedding-based features to compute context- and hierarchy-aware candidate answer types for an input question. Experiments on a recent benchmark show that TIPI results in state-of-the-art performance under the real-world assumption that only a single SPARQL query can be executed over the KB, and substantial reduction in the number of queries in the more general case.