Changmao Li

2025

RAC: Efficient LLM Factuality Correction with Retrieval Augmentation
Changmao Li | Jeffrey Flanigan
Findings of the Association for Computational Linguistics: EMNLP 2025

Large Language Models (LLMs) exhibit impressive results across a wide range of natural language processing (NLP) tasks, yet they can often produce factually incorrect outputs. This paper introduces a simple but effective low-latency post-correction method, Retrieval Augmented Correction (RAC), aimed at enhancing the factual performance of LLMs without requiring additional fine-tuning. Our method is general and can be used with any instruction-tuned LLM, and has greatly reduced latency compared to prior approaches. RAC decomposes the LLM’s output into atomic facts and applies a fine-grained verification and correction process with retrieved content to verify and correct the LLM-generated output. Our extensive experiments show that RAC yields up to 30% improvements over the LLM baselines across three popular factuality evaluation datasets, validating its efficacy and robustness with and without the integration of Retrieval-Augmented Generation (RAG) across different LLMs. Notably, our method has reduced latency up to 40x and reduced token consumption up to 7x compared to previous state-of-the-art post-correction approaches with similar or better performance.

2022

pdf bib abs

Improving Neural Machine Translation with the Abstract Meaning Representation by Combining Graph and Sequence Transformers
Changmao Li | Jeffrey Flanigan
Proceedings of the 2nd Workshop on Deep Learning on Graphs for Natural Language Processing (DLG4NLP 2022)

Previous studies have shown that the Abstract Meaning Representation (AMR) can improve Neural Machine Translation (NMT). However, there has been little work investigating incorporating AMR graphs into Transformer models. In this work, we propose a novel encoder-decoder architecture which augments the Transformer model with a Heterogeneous Graph Transformer (Yao et al., 2020) which encodes source sentence AMR graphs. Experimental results demonstrate the proposed model outperforms the Transformer model and previous non-Transformer based models on two different language pairs in both the high resource setting and low resource setting. Our source code, training corpus and released models are available at https://github.com/jlab-nlp/amr-nmt.

2020

pdf bib abs

Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering
Changmao Li | Jinho D. Choi
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We introduce a novel approach to transformers that learns hierarchical representations in multiparty dialogue. First, three language modeling tasks are used to pre-train the transformers, token- and utterance-level language modeling and utterance order prediction, that learn both token and utterance embeddings for better understanding in dialogue contexts. Then, multi-task learning between the utterance prediction and the token span prediction is applied to fine-tune for span-based question answering (QA). Our approach is evaluated on the FriendsQA dataset and shows improvements of 3.8% and 1.4% over the two state-of-the-art transformer models, BERT and RoBERTa, respectively.

pdf bib abs

Competence-Level Prediction and Resume & Job Description Matching Using Context-Aware Transformer Models
Changmao Li | Elaine Fisher | Rebecca Thomas | Steve Pittard | Vicki Hertzberg | Jinho D. Choi
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

This paper presents a comprehensive study on resume classification to reduce the time and labor needed to screen an overwhelming number of applications significantly, while improving the selection of suitable candidates. A total of 6,492 resumes are extracted from 24,933 job applications for 252 positions designated into four levels of experience for Clinical Research Coordinators (CRC). Each resume is manually annotated to its most appropriate CRC position by experts through several rounds of triple annotation to establish guidelines. As a result, a high Kappa score of 61% is achieved for inter-annotator agreement. Given this dataset, novel transformer-based classification models are developed for two tasks: the first task takes a resume and classifies it to a CRC level (T1), and the second task takes both a resume and a job description to apply and predicts if the application is suited to the job (T2). Our best models using section encoding and a multi-head attention decoding give results of 73.3% to T1 and 79.2% to T2. Our analysis shows that the prediction errors are mostly made among adjacent CRC levels, which are hard for even experts to distinguish, implying the practical value of our models in real HR platforms.

pdf bib abs

Transformer-based Context-aware Sarcasm Detection in Conversation Threads from Social Media
Xiangjue Dong | Changmao Li | Jinho D. Choi
Proceedings of the Second Workshop on Figurative Language Processing

We present a transformer-based sarcasm detection model that accounts for the context from the entire conversation thread for more robust predictions. Our model uses deep transformer layers to perform multi-head attentions among the target utterance and the relevant context in the thread. The context-aware models are evaluated on two datasets from social media, Twitter and Reddit, and show 3.1% and 7.0% improvements over their baselines. Our best models give the F1-scores of 79.0% and 75.0% for the Twitter and Reddit datasets respectively, becoming one of the highest performing systems among 36 participants in this shared task.

Co-authors

Steve Pittard 1

Rebecca Thomas 1

Venues

Fix author