Enbo Zhao
2025
🧜Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Yue Zhang | Yafu Li | Leyang Cui | Deng Cai | Lemao Liu | Tingchen Fu | Xinting Huang | Enbo Zhao | Yu Zhang | Yulong Chen | Longyue Wang | Anh Tuan Luu | Wei Bi | Freda Shi | Shuming Shi
Computational Linguistics, Volume 51, Issue 4 - December 2025
Yue Zhang | Yafu Li | Leyang Cui | Deng Cai | Lemao Liu | Tingchen Fu | Xinting Huang | Enbo Zhao | Yu Zhang | Yulong Chen | Longyue Wang | Anh Tuan Luu | Wei Bi | Freda Shi | Shuming Shi
Computational Linguistics, Volume 51, Issue 4 - December 2025
While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge to the reliability of LLMs in real-world scenarios. In this article, we survey recent efforts on the detection, explanation, and mitigation of hallucination, with an emphasis on the unique challenges posed by LLMs. We present taxonomies of the LLM hallucination phenomena and evaluation benchmarks, analyze existing approaches aiming at mitigating LLM hallucination, and discuss potential directions for future research.
2023
Effidit: An Assistant for Improving Writing Efficiency
Shuming Shi | Enbo Zhao | Wei Bi | Deng Cai | Leyang Cui | Xinting Huang | Haiyun Jiang | Duyu Tang | Kaiqiang Song | Longyue Wang | Chenyan Huang | Guoping Huang | Yan Wang | Piji Li
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Shuming Shi | Enbo Zhao | Wei Bi | Deng Cai | Leyang Cui | Xinting Huang | Haiyun Jiang | Duyu Tang | Kaiqiang Song | Longyue Wang | Chenyan Huang | Guoping Huang | Yan Wang | Piji Li
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Writing assistants are valuable tools that can help writers improve their writing skills. We introduce Effidit (Efficient and Intelligent Editing), a digital writing assistant that facilitates users to write higher-quality text more efficiently through the use of Artificial Intelligence (AI) and Natural Language Processing (NLP) technologies. We significantly expand the capacities of a writing assistantby providing functions in three modules: text completion, hint recommendation, and writing refinement. Based on the above efforts, Effidit can efficiently assist users in creating their own text. Effidit has been deployed to several Tencent products and publicly released at https://effidit.qq.com/.
RobustGEC: Robust Grammatical Error Correction Against Subtle Context Perturbation
Yue Zhang | Leyang Cui | Enbo Zhao | Wei Bi | Shuming Shi
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Yue Zhang | Leyang Cui | Enbo Zhao | Wei Bi | Shuming Shi
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Grammatical Error Correction (GEC) systems play a vital role in assisting people with their daily writing tasks. However, users may sometimes come across a GEC system that initially performs well but fails to correct errors when the inputs are slightly modified. To ensure an ideal user experience, a reliable GEC system should have the ability to provide consistent and accurate suggestions when encountering irrelevant context perturbations, which we refer to as context robustness. In this paper, we introduce RobustGEC, a benchmark designed to evaluate the context robustness of GEC systems. RobustGEC comprises 5,000 GEC cases, each with one original error-correct sentence pair and five variants carefully devised by human annotators. Utilizing RobustGEC, we reveal that state-of-the-art GEC systems still lack sufficient robustness against context perturbations. Moreover, we propose a simple yet effective method for remitting this issue.
2022
“Is Whole Word Masking Always Better for Chinese BERT?”: Probing on Chinese Grammatical Error Correction
Yong Dai | Linyang Li | Cong Zhou | Zhangyin Feng | Enbo Zhao | Xipeng Qiu | Piji Li | Duyu Tang
Findings of the Association for Computational Linguistics: ACL 2022
Yong Dai | Linyang Li | Cong Zhou | Zhangyin Feng | Enbo Zhao | Xipeng Qiu | Piji Li | Duyu Tang
Findings of the Association for Computational Linguistics: ACL 2022
Whole word masking (WWM), which masks all subwords corresponding to a word at once, makes a better English BERT model. For the Chinese language, however, there is no subword because each token is an atomic character. The meaning of a word in Chinese is different in that a word is a compositional unit consisting of multiple characters. Such difference motivates us to investigate whether WWM leads to better context understanding ability for Chinese BERT. To achieve this, we introduce two probing tasks related to grammatical error correction and ask pretrained models to revise or insert tokens in a masked language modeling manner. We construct a dataset including labels for 19,075 tokens in 10,448 sentences. We train three Chinese BERT models with standard character-level masking (CLM), WWM, and a combination of CLM and WWM, respectively. Our major findings are as follows: First, when one character needs to be inserted or replaced, the model trained with CLM performs the best. Second, when more than one character needs to be handled, WWM is the key to better performance. Finally, when being fine-tuned on sentence-level downstream tasks, models trained with different masking strategies perform comparably.
2021
TexSmart: A System for Enhanced Natural Language Understanding
Lemao Liu | Haisong Zhang | Haiyun Jiang | Yangming Li | Enbo Zhao | Kun Xu | Linfeng Song | Suncong Zheng | Botong Zhou | Dick Zhu | Xiao Feng | Tao Chen | Tao Yang | Dong Yu | Feng Zhang | ZhanHui Kang | Shuming Shi
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations
Lemao Liu | Haisong Zhang | Haiyun Jiang | Yangming Li | Enbo Zhao | Kun Xu | Linfeng Song | Suncong Zheng | Botong Zhou | Dick Zhu | Xiao Feng | Tao Chen | Tao Yang | Dong Yu | Feng Zhang | ZhanHui Kang | Shuming Shi
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations
This paper introduces TexSmart, a text understanding system that supports fine-grained named entity recognition (NER) and enhanced semantic analysis functionalities. Compared to most previous publicly available text understanding systems and tools, TexSmart holds some unique features. First, the NER function of TexSmart supports over 1,000 entity types, while most other public tools typically support several to (at most) dozens of entity types. Second, TexSmart introduces new semantic analysis functions like semantic expansion and deep semantic representation, that are absent in most previous systems. Third, a spectrum of algorithms (from very fast algorithms to those that are relatively slow but more accurate) are implemented for one function in TexSmart, to fulfill the requirements of different academic and industrial applications. The adoption of unsupervised or weakly-supervised algorithms is especially emphasized, with the goal of easily updating our models to include fresh data with less human annotation efforts.
Search
Fix author
Co-authors
- Shuming Shi 4
- Leyang Cui 3
- Victoria W. 3
- Deng Cai 2
- Xinting Huang 2
- Haiyun Jiang 2
- Piji Li (李丕绩) 2
- Lemao Liu 2
- Duyu Tang 2
- Longyue Wang 2
- Yue Zhang 2
- Tao Chen 1
- Yulong Chen 1
- Yong Dai 1
- Xiao Feng 1
- Zhangyin Feng 1
- Tingchen Fu 1
- Chenyan Huang 1
- Guoping Huang 1
- Zhanhui Kang 1
- Yangming Li 1
- Yafu Li 1
- Linyang Li 1
- Xipeng Qiu (邱锡鹏) 1
- Freda Shi 1
- Kaiqiang Song 1
- Linfeng Song 1
- Luu Anh Tuan 1
- Yan Wang 1
- Kun Xu 1
- Tao Yang 1
- Dong Yu (于东) 1
- Haisong Zhang 1
- Feng Zhang 1
- Yu Zhang 1
- Suncong Zheng 1
- Botong Zhou 1
- Cong Zhou 1
- Dick Zhu 1