Rongjun Li


2024

pdf bib
Enhancing Semantic Consistency of Large Language Models through Model Editing: An Interpretability-Oriented Approach
Jingyuan Yang | Dapeng Chen | Yajing Sun | Rongjun Li | Zhiyong Feng | Wei Peng
Findings of the Association for Computational Linguistics: ACL 2024

A Large Language Model (LLM) tends to generate inconsistent and sometimes contradictory outputs when presented with a prompt that has equivalent semantics but is expressed differently from the original prompt. To achieve semantic consistency of an LLM, one of the key approaches is to finetune the model with prompt-output pairs with semantically equivalent meanings. Despite its effectiveness, a data-driven finetuning method incurs substantial computation costs in data preparation and model optimization. In this regime, an LLM is treated as a “black box”, restricting our ability to gain deeper insights into its internal mechanism. In this paper, we are motivated to enhance the semantic consistency of LLMs through a more interpretable method (i.e., model editing) to this end. We first identify the model components (i.e., attention heads) that have a key impact on the semantic consistency of an LLM. We subsequently inject biases into the output of these model components along the semantic-consistency activation direction. It is noteworthy that these modifications are cost-effective, without reliance on mass manipulations of the original model parameters. Through comprehensive experiments on the constructed NLU and open-source NLG datasets, our method demonstrates significant improvements in the semantic consistency and task performance of LLMs. Additionally, our method exhibits promising generalization capabilities by performing well on tasks beyond the primary tasks.

pdf bib
Contextual Modeling for Document-level ASR Error Correction
Jin Jiang | Xunjian Yin | Xiaojun Wan | Wei Peng | Rongjun Li | Jingyuan Yang | Yanquan Zhou
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Contextual information, including the sentences in the same document and in other documents of the dataset, plays a crucial role in improving the accuracy of document-level ASR Error Correction (AEC), while most previous works ignore this. In this paper, we propose a context-aware method that utilizes a k-Nearest Neighbors (kNN) approach to enhance the AEC model by retrieving a datastore containing contextual information. We conduct experiments on two English and two Chinese datasets, and the results demonstrate that our proposed model can effectively utilize contextual information to improve document-level AEC. Furthermore, the context information from the whole dataset provides even better results.

2023

pdf bib
New Datasets and Controllable Iterative Data Augmentation Method for Code-switching ASR Error Correction
Zhaohong Wan | Xiaojun Wan | Wei Peng | Rongjun Li
Findings of the Association for Computational Linguistics: EMNLP 2023

With the wide use of automatic speech recognition(ASR) systems, researchers pay more attention to the ASR error correction task to improve the quality of recognition results. In particular, ASR in bilingual or multilingual settings, namely code-switching ASR, has greater challenges and research value. In this paper, we first present code-switching ASR correction datasets obtained from solid ASR systems and automatic annotators. The datasets contain Chinese-English code-switching dialogues of bilingual speakers in Singapore, Malaysia, and Hong Kong. Based on this task, we propose a controllable iterative (CI) data augmentation method for improving the performance of mainstream ASR error correction systems. With a small amount of training data, our proposed method has the ability to iteratively produce abundant pseudo parallel data from the monolingual corpus for Chinese-English code-switching ASR correction. Results of experiments show that our method achieves the best performance compared with the rule-based, back-translation-based data augmentation methods and large language model ChatGPT.