James Allan

UMass Amherst

Also published as: J. Allan

Other people with similar names: James Allen (Rochester)


2023

pdf bib
Conditional Natural Language Inference
Youngwoo Kim | Razieh Rahimi | James Allan
Findings of the Association for Computational Linguistics: EMNLP 2023

To properly explain sentence pairs that provide contradictory (different) information for different conditions, we introduce the task of conditional natural language inference (Cond-NLI) and focus on automatically extracting contradictory aspects and their conditions from a sentence pair. Cond-NLI can help to provide a full spectrum of information, such as when there are multiple answers to a question each addressing a specific condition, or reviews with different opinions for different conditions. We show that widely-used feature-attribution explanation models are not suitable for finding conditions, especially when sentences are long and are written independently. We propose a simple yet effective model for the original NLI task that can successfully extract conditions while not requiring token-level annotations. Our model enhances the interpretability of the NLI task while maintaining comparable accuracy. To evaluate models for the Cond-NLI, we build and release a token-level annotated dataset BioClaim which contains potentially contradictory claims from the biomedical domain. Our experiments show that our proposed model outperforms the full cross-encoder and other baselines in extracting conditions. It also performs on-par with GPT-3 which has an order of magnitude more parameters and trained on a huge amount of data.

pdf bib
AutoTriggER: Label-Efficient and Robust Named Entity Recognition with Auxiliary Trigger Extraction
Dong-Ho Lee | Ravi Kiran Selvam | Sheikh Muhammad Sarwar | Bill Yuchen Lin | Fred Morstatter | Jay Pujara | Elizabeth Boschee | James Allan | Xiang Ren
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Deep neural models for named entity recognition (NER) have shown impressive results in overcoming label scarcity and generalizing to unseen entities by leveraging distant supervision and auxiliary information such as explanations. However, the costs of acquiring such additional information are generally prohibitive. In this paper, we present a novel two-stage framework (AutoTriggER) to improve NER performance by automatically generating and leveraging “entity triggers” which are human-readable cues in the text that help guide the model to make better decisions. Our framework leverages post-hoc explanation to generate rationales and strengthens a model’s prior knowledge using an embedding interpolation technique. This approach allows models to exploit triggers to infer entity boundaries and types instead of solely memorizing the entity words themselves. Through experiments on three well-studied NER datasets, AutoTriggER shows strong label-efficiency, is capable of generalizing to unseen entities, and outperforms the RoBERTa-CRF baseline by nearly 0.5 F1 points on average.

2019

pdf bib
A Multi-Task Architecture on Relevance-based Neural Query Translation
Sheikh Muhammad Sarwar | Hamed Bonab | James Allan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We describe a multi-task learning approach to train a Neural Machine Translation (NMT) model with a Relevance-based Auxiliary Task (RAT) for search query translation. The translation process for Cross-lingual Information Retrieval (CLIR) task is usually treated as a black box and it is performed as an independent step. However, an NMT model trained on sentence-level parallel data is not aware of the vocabulary distribution of the retrieval corpus. We address this problem and propose a multi-task learning architecture that achieves 16% improvement over a strong baseline on Italian-English query-document dataset. We show using both quantitative and qualitative analysis that our model generates balanced and precise translations with the regularization effect it achieves from multi-task learning paradigm.

pdf bib
FEVER Breaker’s Run of Team NbAuzDrLqg
Youngwoo Kim | James Allan
Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)

We describe our submission for the Breaker phase of the second Fact Extraction and VERification (FEVER) Shared Task. Our adversarial data can be explained by two perspectives. First, we aimed at testing model’s ability to retrieve evidence, when appropriate query terms could not be easily generated from the claim. Second, we test model’s ability to precisely understand the implications of the texts, which we expect to be rare in FEVER 1.0 dataset. Overall, we suggested six types of adversarial attacks. The evaluation on the submitted systems showed that the systems were only able get both the evidence and label correct in 20% of the data. We also demonstrate our adversarial run analysis in the data development process.

2017

pdf bib
Improving Document Clustering by Removing Unnatural Language
Myungha Jang | Jinho D. Choi | James Allan
Proceedings of the 3rd Workshop on Noisy User-generated Text

Technical documents contain a fair amount of unnatural language, such as tables, formulas, and pseudo-code. Unnatural language can bean important factor of confusing existing NLP tools. This paper presents an effective method of distinguishing unnatural language from natural language, and evaluates the impact of un-natural language detection on NLP tasks such as document clustering. We view this problem as an information extraction task and build a multiclass classification model identifying unnatural language components into four categories. First, we create a new annotated corpus by collecting slides and papers in various for-mats, PPT, PDF, and HTML, where unnatural language components are annotated into four categories. We then explore features available from plain text to build a statistical model that can handle any format as long as it is converted into plain text. Our experiments show that re-moving unnatural language components gives an absolute improvement in document cluster-ing by up to 15%. Our corpus and tool are publicly available

2008

pdf bib
Proceedings of ACL-08: HLT
Johanna D. Moore | Simone Teufel | James Allan | Sadaoki Furui
Proceedings of ACL-08: HLT

pdf bib
Proceedings of ACL-08: HLT, Short Papers
Johanna D. Moore | Simone Teufel | James Allan | Sadaoki Furui
Proceedings of ACL-08: HLT, Short Papers

2007

pdf bib
Information Retrieval On Empty Fields
Victor Lavrenko | Xing Yi | James Allan
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
A Case For Shorter Queries, and Helping Users Create Them
Giridhar Kumaran | James Allan
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Question Answering Using Integrated Information Retrieval and Information Extraction
Barry Schiffman | Kathleen McKeown | Ralph Grishman | James Allan
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts
Marti Hearst | Gina-Anne Levow | James Allan
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts

2005

pdf bib
Using Names and Topics for New Event Detection
Giridhar Kumaran | James Allan
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Matching Inconsistently Spelled Names in Automatic Speech Recognizer Output for Information Retrieval
Hema Raghavan | James Allan
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Cross-Document Coreference on a Large Scale Corpus
Chung Heong Gooi | James Allan
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

pdf bib
Using Soundex Codes for Indexing Names in ASR Documents
Hema Raghavan | James Allan
Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004

2001

pdf bib
An Evaluation Corpus For Temporal Summarization
Vikash Khandelwal | Rahul Gupta | James Allan
Proceedings of the First International Conference on Human Language Technology Research

pdf bib
Monitoring the News: a TDT demonstration system
David Frey | Rahul Gupta | Vikas Khandelwal | Victor Lavrenko | Anton Leuski | James Allan
Proceedings of the First International Conference on Human Language Technology Research

1993

pdf bib
The SMART Information Retrieval Project
C. Buckley | G. Salton | J. Allan
Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993