Shuo Li


2024

pdf bib
Uncertainty in Language Models: Assessment through Rank-Calibration
Xinmeng Huang | Shuo Li | Mengxin Yu | Matteo Sesia | Hamed Hassani | Insup Lee | Osbert Bastani | Edgar Dobriban
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Language Models (LMs) have shown promising performance in natural language generation. However, as LMs often generate incorrect or hallucinated responses, it is crucial to correctly quantify their uncertainty in responding to given inputs. In addition to verbalized confidence elicited via prompting, many uncertainty measures (e.g., semantic entropy and affinity-graph-based measures) have been proposed. However, these measures can differ greatly, and it is unclear how to compare them, partly because they take values over different ranges (e.g., [0,∞) or [0,1]). In this work, we address this issue by developing a novel and practical framework, termed *Rank-Calibration*, to assess uncertainty and confidence measures for LMs. Our key tenet is that higher uncertainty (or lower confidence) should imply lower generation quality, on average. Rank-calibration quantifies deviations from this ideal relationship in a principled manner, without requiring ad hoc binary thresholding of the correctness score (e.g., ROUGE or METEOR). The broad applicability and the granular interpretability of our methods are demonstrated empirically.

pdf bib
TRAQ: Trustworthy Retrieval Augmented Question Answering via Conformal Prediction
Shuo Li | Sangdon Park | Insup Lee | Osbert Bastani
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

When applied to open-domain question answering, large language models (LLMs) frequently generate incorrect responses based on made-up facts, which are called hallucinations. Retrieval augmented generation (RAG) is a promising strategy to avoid hallucinations, but it does not provide guarantees on its correctness. To address this challenge, we propose the Trustworthy Retrieval Augmented Question Answering, or *TRAQ*, which provides the first end-to-end statistical correctness guarantee for RAG. TRAQ uses conformal prediction, a statistical technique for constructing prediction sets that are guaranteed to contain the semantically correct response with high probability. Additionally, TRAQ leverages Bayesian optimization to minimize the size of the constructed sets. In an extensive experimental evaluation, we demonstrate that TRAQ provides the desired correctness guarantee while reducing prediction set size by 16.2% on average compared to an ablation. The implementation is available: [https://github.com/shuoli90/TRAQ](https://github.com/shuoli90/TRAQ).

2014

pdf bib
UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation
Liang Tian | Derek F. Wong | Lidia S. Chao | Paulo Quaresma | Francisco Oliveira | Yi Lu | Shuo Li | Yiming Wang | Longyue Wang
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Parallel corpus is a valuable resource for cross-language information retrieval and data-driven natural language processing systems, especially for Statistical Machine Translation (SMT). However, most existing parallel corpora to Chinese are subject to in-house use, while others are domain specific and limited in size. To a certain degree, this limits the SMT research. This paper describes the acquisition of a large scale and high quality parallel corpora for English and Chinese. The corpora constructed in this paper contain about 15 million English-Chinese (E-C) parallel sentences, and more than 2 million training data and 5,000 testing sentences are made publicly available. Different from previous work, the corpus is designed to embrace eight different domains. Some of them are further categorized into different topics. The corpus will be released to the research community, which is available at the NLP2CT website.

2013

pdf bib
Experiments with POS-based restructuring and alignment-based reordering for statistical machine translation
Shuo Li | Derek F. Wong | Lidia S. Chao
Proceedings of the Second Workshop on Hybrid Approaches to Translation

2012

pdf bib
A Joint Chinese Named Entity Recognition and Disambiguation System
Longyue Wang | Shuo Li | Derek F. Wong | Lidia S. Chao
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing