Amir Saffari

2023

mReFinED: An Efficient End-to-End Multilingual Entity Linking System
Peerat Limkonchotiwat | Weiwei Cheng | Christos Christodoulopoulos | Amir Saffari | Jens Lehmann
Findings of the Association for Computational Linguistics: EMNLP 2023

End-to-end multilingual entity linking (MEL) is concerned with identifying multilingual entity mentions and their corresponding entity IDs in a knowledge base. Existing works assumed that entity mentions were given and skipped the entity mention detection step due to a lack of high-quality multilingual training corpora. To overcome this limitation, we propose mReFinED, the first end-to-end multilingual entity linking. Additionally, we propose a bootstrapping mention detection framework that enhances the quality of training corpora. Our experimental results demonstrated that mReFinED outperformed the best existing work in the end-to-end MEL task while being 44 times faster.

pdf bib abs

Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering
Jinheon Baek | Alham Fikri Aji | Amir Saffari
Proceedings of the First Workshop on Matching From Unstructured and Structured Data (MATCHING 2023)

Large Language Models (LLMs) are capable of performing zero-shot closed-book question answering tasks, based on their internal knowledge stored in parameters during pre-training. However, such internalized knowledge might be insufficient and incorrect, which could lead LLMs to generate factually wrong answers. Furthermore, fine-tuning LLMs to update their knowledge is expensive. To this end, we propose to augment the knowledge directly in the input of LLMs. Specifically, we first retrieve the relevant facts to the input question from the knowledge graph based on semantic similarities between the question and its associated facts. After that, we prepend the retrieved facts to the input question in the form of the prompt, which is then forwarded to LLMs to generate the answer. Our framework, Knowledge-Augmented language model PromptING (KAPING), requires no model training, thus completely zero-shot. We validate the performance of our KAPING framework on the knowledge graph question answering task, that aims to answer the user’s question based on facts over a knowledge graph, on which ours outperforms relevant zero-shot baselines by up to 48% in average, across multiple LLMs of various sizes.

pdf bib abs

Knowledge Graph-augmented Language Models for Complex Question Answering
Priyanka Sen | Sandeep Mavadia | Amir Saffari
Proceedings of the 1st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE)

Large language models have shown impressive abilities to reason over input text, however, they are prone to hallucinations. On the other hand, end-to-end knowledge graph question answering (KGQA) models output responses grounded in facts, but they still struggle with complex reasoning, such as comparison or ordinal questions. In this paper, we propose a new method for complex question answering where we combine a knowledge graph retriever based on an end-to-end KGQA model with a language model that reasons over the retrieved facts to return an answer. We observe that augmenting language model prompts with retrieved KG facts improves performance over using a language model alone by an average of 83%. In particular, we see improvements on complex questions requiring count, intersection, or multi-hop reasoning operations.

pdf bib abs

Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering
Jinheon Baek | Alham Fikri Aji | Amir Saffari
Proceedings of the 1st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE)

2022

pdf bib abs

CLASP: Few-Shot Cross-Lingual Data Augmentation for Semantic Parsing
Andy Rosenbaum | Saleh Soltan | Wael Hamza | Marco Damonte | Isabel Groves | Amir Saffari
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

A bottleneck to developing Semantic Parsing (SP) models is the need for a large volume of human-labeled training data. Given the complexity and cost of human annotation for SP, labeled data is often scarce, particularly in multilingual settings. Large Language Models (LLMs) excel at SP given only a few examples, however LLMs are unsuitable for runtime systems which require low latency. In this work, we propose CLASP, a simple method to improve low-resource SP for moderate-sized models: we generate synthetic data from AlexaTM 20B to augment the training set for a model 40x smaller (500M parameters). We evaluate on two datasets in low-resource settings: English PIZZA, containing either 348 or 16 real examples, and mTOP cross-lingual zero-shot, where training data is available only in English, and the model must generalize to four new languages. On both datasets, we show significant improvements over strong baseline methods.

pdf bib abs

Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering
Priyanka Sen | Alham Fikri Aji | Amir Saffari
Proceedings of the 29th International Conference on Computational Linguistics

We introduce Mintaka, a complex, natural, and multilingual dataset designed for experimenting with end-to-end question-answering models. Mintaka is composed of 20,000 question-answer pairs collected in English, annotated with Wikidata entities, and translated into Arabic, French, German, Hindi, Italian, Japanese, Portuguese, and Spanish for a total of 180,000 samples. Mintaka includes 8 types of complex questions, including superlative, intersection, and multi-hop questions, which were naturally elicited from crowd workers. We run baselines over Mintaka, the best of which achieves 38% hits@1 in English and 31% hits@1 multilingually, showing that existing models have room for improvement. We release Mintaka at https://github.com/amazon-research/mintaka.

2021

pdf bib abs

End-to-End Entity Resolution and Question Answering Using Differentiable Knowledge Graphs
Amir Saffari | Armin Oliya | Priyanka Sen | Tom Ayoola
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Recently, end-to-end (E2E) trained models for question answering over knowledge graphs (KGQA) have delivered promising results using only a weakly supervised dataset. However, these models are trained and evaluated in a setting where hand-annotated question entities are supplied to the model, leaving the important and non-trivial task of entity resolution (ER) outside the scope of E2E learning. In this work, we extend the boundaries of E2E learning for KGQA to include the training of an ER component. Our model only needs the question text and the answer entities to train, and delivers a stand-alone QA model that does not require an additional ER component to be supplied during runtime. Our approach is fully differentiable, thanks to its reliance on a recent method for building differentiable KGs (Cohen et al., 2020). We evaluate our E2E trained model on two public datasets and show that it comes close to baseline models that use hand-annotated entities.

pdf bib abs

Expanding End-to-End Question Answering on Differentiable Knowledge Graphs with Intersection
Priyanka Sen | Armin Oliya | Amir Saffari
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

End-to-end question answering using a differentiable knowledge graph is a promising technique that requires only weak supervision, produces interpretable results, and is fully differentiable. Previous implementations of this technique (Cohen et al, 2020) have focused on single-entity questions using a relation following operation. In this paper, we propose a model that explicitly handles multiple-entity questions by implementing a new intersection operation, which identifies the shared elements between two sets of entities. We find that introducing intersection improves performance over a baseline model on two datasets, WebQuestionsSP (69.6% to 73.3% Hits@1) and ComplexWebQuestions (39.8% to 48.7% Hits@1), and in particular, improves performance on questions with multiple entities by over 14% on WebQuestionsSP and by 19% on ComplexWebQuestions.

2020

pdf bib abs

Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity
Hamza Harkous | Isabel Groves | Amir Saffari
Proceedings of the 28th International Conference on Computational Linguistics

End-to-end neural data-to-text (D2T) generation has recently emerged as an alternative to pipeline-based architectures. However, it has faced challenges generalizing to new domains and generating semantically consistent text. In this work, we present DataTuner, a neural, end-to-end data-to-text generation system that makes minimal assumptions about the data representation and target domain. We take a two-stage generation-reranking approach, combining a fine-tuned language model with a semantic fidelity classifier. Each component is learnt end-toe-nd without needing dataset-specific heuristics, entity delexicalization, or post-processing. We show that DataTuner achieves state of the art results on automated metrics across four major D2T datasets (LDC2017T10, WebNLG, ViGGO, and Cleaned E2E), with fluency assessed by human annotators as nearing or exceeding the human-written reference texts. Our generated text has better semantic fidelity than the state of the art on these datasets. We further demonstrate that our model-based semantic fidelity scorer is a better assessment tool compared to traditional heuristic-based measures of semantic accuracy.

pdf bib abs

What do Models Learn from Question Answering Datasets?
Priyanka Sen | Amir Saffari
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

While models have reached superhuman performance on popular question answering (QA) datasets such as SQuAD, they have yet to outperform humans on the task of question answering itself. In this paper, we investigate if models are learning reading comprehension from QA datasets by evaluating BERT-based models across five datasets. We evaluate models on their generalizability to out-of-domain examples, responses to missing or incorrect data, and ability to handle question variations. We find that no single dataset is robust to all of our experiments and identify shortcomings in both datasets and evaluation methods. Following our analysis, we make recommendations for building future QA datasets that better evaluate the task of question answering through reading comprehension. We also release code to convert QA datasets to a shared format for easier experimentation at https://github.com/amazon-research/qa-dataset-converter