2024
pdf
bib
abs
Successfully Guiding Humans with Imperfect Instructions by Highlighting Potential Errors and Suggesting Corrections
Lingjun Zhao
|
Khanh Xuan Nguyen
|
Hal Daumé Iii
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Language models will inevitably err in situations with which they are unfamiliar. However, by effectively communicating uncertainties, they can still guide humans toward making sound decisions in those contexts. We demonstrate this idea by developing HEAR, a system that can successfully guide humans in simulated residential environments despite generating potentially inaccurate instructions. Diverging from systems that provide users with only the instructions they generate, HEAR warns users of potential errors in its instructions and suggests corrections. This rich uncertainty information effectively prevents misguidance and reduces the search space for users. Evaluation with 80 users shows that HEAR achieves a 13% increase in success rate and a 29% reduction in final location error distance compared to only presenting instructions to users. Interestingly, we find that offering users possibilities to explore, HEAR motivates them to make more attempts at the task, ultimately leading to a higher success rate. To our best knowledge, this work is the first to show the practical benefits of uncertainty communication in a long-horizon sequential decision-making problem.
2023
pdf
bib
abs
Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for Instruction Generation Models
Lingjun Zhao
|
Khanh Nguyen
|
Hal Daumé III
Findings of the Association for Computational Linguistics: ACL 2023
Recent work studies the cognitive capabilities of language models through psychological tests designed for humans. While these studies are helpful for understanding the general capabilities of these models, there is no guarantee that a model possessing sufficient capabilities to pass those tests would actually use those capabilities in performing real-life tasks. In this work, we formulate task-oriented cognitive capabilities, which are human-like cognitive capabilities that language models leverage to perform tasks. These capabilities are (i) the ability to quickly generate good candidate utterances (the search capability) (ii) the ability to predict how a listener interprets those utterances and choose the most appropriate one (the pragmatic capability). We design an evaluation scheme for comparing these capabilities of a language model with those of a human. Applying this scheme to examine various models in a navigation instruction generation problem, we find that their pragmatic capability is severely lacking. This insight leads us to augment them with better models of the listener and obtain a significant boost of 11% in success rate in guiding real humans. Our work advocates for having a principled procedure for aligning language models with humans that involves (i) formulating task-oriented capabilities, (ii) devising a method to quantify their deficiency, and (iii) iteratively improving them.
pdf
bib
abs
Hallucination Detection for Grounded Instruction Generation
Lingjun Zhao
|
Khanh Nguyen
|
Hal Daumé III
Findings of the Association for Computational Linguistics: EMNLP 2023
We investigate the problem of generating instructions to guide humans to navigate in simulated residential environments. A major issue with current models is hallucination: they generate references to actions or objects that are inconsistent with what a human follower would perform or encounter along the described path. We develop a model that detects these hallucinated references by adopting a model pre-trained on a large corpus of image-text pairs, and fine-tuning it with a contrastive loss that separates correct instructions from instructions containing synthesized hallucinations. Our final model outperforms several baselines, including using word probability estimated by the instruction-generation model, and supervised models based on LSTM and Transformer.
2020
pdf
bib
abs
Towards Few-Shot Event Mention Retrieval: An Evaluation Framework and A Siamese Network Approach
Bonan Min
|
Yee Seng Chan
|
Lingjun Zhao
Proceedings of the Twelfth Language Resources and Evaluation Conference
Automatically analyzing events in a large amount of text is crucial for situation awareness and decision making. Previous approaches treat event extraction as “one size fits all” with an ontology defined a priori. The resulted extraction models are built just for extracting those types in the ontology. These approaches cannot be easily adapted to new event types nor new domains of interest. To accommodate personalized event-centric information needs, this paper introduces the few-shot Event Mention Retrieval (EMR) task: given a user-supplied query consisting of a handful of event mentions, return relevant event mentions found in a corpus. This formulation enables “query by example”, which drastically lowers the bar of specifying event-centric information needs. The retrieval setting also enables fuzzy search. We present an evaluation framework leveraging existing event datasets such as ACE. We also develop a Siamese Network approach, and show that it performs better than ad-hoc retrieval models in the few-shot EMR setting.
pdf
bib
abs
Cross-lingual Information Retrieval with BERT
Zhuolin Jiang
|
Amro El-Jaroudi
|
William Hartmann
|
Damianos Karakos
|
Lingjun Zhao
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)
Multiple neural language models have been developed recently, e.g., BERT and XLNet, and achieved impressive results in various NLP tasks including sentence classification, question answering and document ranking. In this paper, we explore the use of the popular bidirectional language model, BERT, to model and learn the relevance between English queries and foreign-language documents in the task of cross-lingual information retrieval. A deep relevance matching model based on BERT is introduced and trained by finetuning a pretrained multilingual BERT model with weak supervision, using home-made CLIR training data derived from parallel corpora. Experimental results of the retrieval of Lithuanian documents against short English queries show that our model is effective and outperforms the competitive baseline approaches.
pdf
bib
abs
The 2019 BBN Cross-lingual Information Retrieval System
Le Zhang
|
Damianos Karakos
|
William Hartmann
|
Manaj Srivastava
|
Lee Tarlin
|
David Akodes
|
Sanjay Krishna Gouda
|
Numra Bathool
|
Lingjun Zhao
|
Zhuolin Jiang
|
Richard Schwartz
|
John Makhoul
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)
In this paper, we describe a cross-lingual information retrieval (CLIR) system that, given a query in English, and a set of audio and text documents in a foreign language, can return a scored list of relevant documents, and present findings in a summary form in English. Foreign audio documents are first transcribed by a state-of-the-art pretrained multilingual speech recognition model that is finetuned to the target language. For text documents, we use multiple multilingual neural machine translation (MT) models to achieve good translation results, especially for low/medium resource languages. The processed documents and queries are then scored using a probabilistic CLIR model that makes use of the probability of translation from GIZA translation tables and scores from a Neural Network Lexical Translation Model (NNLTM). Additionally, advanced score normalization, combination, and thresholding schemes are employed to maximize the Average Query Weighted Value (AQWV) scores. The CLIR output, together with multiple translation renderings, are selected and translated into English snippets via a summarization model. Our turnkey system is language agnostic and can be quickly trained for a new low-resource language in few days.
2019
pdf
bib
abs
Weakly Supervised Attentional Model for Low Resource Ad-hoc Cross-lingual Information Retrieval
Lingjun Zhao
|
Rabih Zbib
|
Zhuolin Jiang
|
Damianos Karakos
|
Zhongqiang Huang
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)
We propose a weakly supervised neural model for Ad-hoc Cross-lingual Information Retrieval (CLIR) from low-resource languages. Low resource languages often lack relevance annotations for CLIR, and when available the training data usually has limited coverage for possible queries. In this paper, we design a model which does not require relevance annotations, instead it is trained on samples extracted from translation corpora as weak supervision. This model relies on an attention mechanism to learn spans in the foreign sentence that are relevant to the query. We report experiments on two low resource languages: Swahili and Tagalog, trained on less that 100k parallel sentences each. The proposed model achieves 19 MAP points improvement compared to using CNNs for feature extraction, 12 points improvement from machine translation-based CLIR, and up to 6 points improvement compared to probabilistic CLIR models.