2024
pdf
bib
abs
Refining Corpora from a Model Calibration Perspective for Chinese Spelling Correction
Dingyao Yu
|
Yang An
|
Wei Ye
|
Xiongfeng Xiao
|
Shaoguang Mao
|
Tao Ge
|
Shikun Zhang
Findings of the Association for Computational Linguistics: ACL 2024
Chinese Spelling Correction (CSC) commonly lacks large-scale high-quality corpora, due to the labor-intensive labeling of spelling errors in real-life human writing or typing scenarios. Two data augmentation methods are widely adopted: (1) *Random Replacement* with the guidance of confusion sets and (2) *OCR/ASR-based Generation* that simulates character misusing. However, both methods inevitably introduce noisy data (e.g., false spelling errors), potentially leading to over-correction. By carefully analyzing the two types of corpora, we find that though the latter achieves more robust generalization performance, the former yields better-calibrated CSC models. We then provide a theoretical analysis of this empirical observation, based on which a corpus refining strategy is proposed. Specifically, OCR/ASR-based data samples are fed into a well-calibrated CSC model trained on random replacement-based corpora and then filtered based on prediction confidence. By learning a simple BERT-based model on the refined OCR/ASR-based corpus, we set up impressive state-of-the-art performance on three widely-used benchmarks, while significantly alleviating over-correction (e.g., lowering false positive predictions).
pdf
bib
CERD: A Comprehensive Chinese Rhetoric Dataset for Rhetorical Understanding and Generation in Essays
Nuowei Liu
|
Xinhao Chen
|
Hongyi Wu
|
Changzhi Sun
|
Man Lan
|
Yuanbin Wu
|
Xiaopeng Bai
|
Shaoguang Mao
|
Yan Xia
Findings of the Association for Computational Linguistics: EMNLP 2024
pdf
bib
abs
Towards Explainable Chinese Native Learner Essay Fluency Assessment: Dataset, Tasks, and Method
Xinshu Shen
|
Hongyi Wu
|
Yadong Zhang
|
Man Lan
|
Xiaopeng Bai
|
Shaoguang Mao
|
Yuanbin Wu
|
Xinlin Zhuang
|
Li Cai
Findings of the Association for Computational Linguistics: EMNLP 2024
Grammatical Error Correction (GEC) is a crucial technique in Automated Essay Scoring (AES) for evaluating the fluency of essays. However, in Chinese, existing GEC datasets often fail to consider the importance of specific grammatical error types within compositional scenarios, lack research on data collected from native Chinese speakers, and largely overlook cross-sentence grammatical errors. Furthermore, the measurement of the overall fluency of an essay is often overlooked. To address these issues, we present CEFA (Chinese Essay Fluency Assessment), an extensive corpus that is derived from essays authored by native Chinese-speaking primary and secondary students and encapsulates essay fluency scores along with both coarse and fine-grained grammatical error types and corrections. Experiments employing various benchmark models on CEFA substantiate the challenge of our dataset. Our findings further highlight the significance of fine-grained annotations in fluency assessment and the mutually beneficial relationship between error types and corrections
pdf
bib
abs
Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration
Zhenhailong Wang
|
Shaoguang Mao
|
Wenshan Wu
|
Tao Ge
|
Furu Wei
|
Heng Ji
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Human intelligence thrives on cognitive synergy, where collaboration among different minds yield superior outcomes compared to isolated individuals. In this work, we propose Solo Performance Prompting (SPP), which transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas. A cognitive synergist is an intelligent agent that collaboratively combines multiple minds’ strengths and knowledge to enhance problem-solving in complex tasks. By dynamically identifying and simulating different personas based on task inputs, SPP unleashes the potential of cognitive synergy in LLMs. Our in-depth analysis shows that assigning multiple fine-grained personas in LLMs improves problem-solving abilities compared to using a single or fixed number of personas. We evaluate SPP on three challenging tasks: Trivia Creative Writing, Codenames Collaborative, and Logic Grid Puzzle, encompassing both knowledge-intensive and reasoning-intensive types. Unlike previous works, such as Chain-of-Thought, that solely enhance the reasoning abilities in LLMs, experimental results demonstrate that SPP effectively reduces factual hallucination, and maintains strong reasoning capabilities. Additionally, comparative experiments show that cognitive synergy only emerges in GPT-4 and does not appear in less capable models, such as GPT-3.5-turbo and Llama2-13b-chat, which draws an interesting analogy to human development. Code, data, and prompts can be found at: https://github.com/MikeWangWZHL/Solo-Performance-Prompting.git.
pdf
bib
abs
Low-code LLM: Graphical User Interface over Large Language Models
Yuzhe Cai
|
Shaoguang Mao
|
Wenshan Wu
|
Zehua Wang
|
Yaobo Liang
|
Tao Ge
|
Chenfei Wu
|
WangYou WangYou
|
Ting Song
|
Yan Xia
|
Nan Duan
|
Furu Wei
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations)
Utilizing Large Language Models (LLMs) for complex tasks is challenging, often involving a time-consuming and uncontrollable prompt engineering process. This paper introduces a novel human-LLM interaction framework, Low-code LLM. It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses. Through visual interaction with a graphical user interface, users can incorporate their ideas into the process without writing trivial prompts. The proposed Low-code LLM framework consists of a Planning LLM that designs a structured planning workflow for complex tasks, which can be correspondingly edited and confirmed by users through low-code visual programming operations, and an Executing LLM that generates responses following the user-confirmed workflow. We highlight three advantages of the low-code LLM: user-friendly interaction, controllable generation, and wide applicability. We demonstrate its benefits using four typical applications. By introducing this framework, we aim to bridge the gap between humans and LLMs, enabling more effective and efficient utilization of LLMs for complex tasks. The code, prompts, and experimental details are available at https://github.com/moymix/TaskMatrix/tree/main/LowCodeLLM. A system demonstration video can be found at https://www.youtube.com/watch?v=jb2C1vaeO3E.
2023
pdf
bib
abs
Smart Word Suggestions for Writing Assistance
Chenshuo Wang
|
Shaoguang Mao
|
Tao Ge
|
Wenshan Wu
|
Xun Wang
|
Yan Xia
|
Jonathan Tien
|
Dongyan Zhao
Findings of the Association for Computational Linguistics: ACL 2023
Enhancing word usage is a desired feature for writing assistance. To further advance research in this area, this paper introduces “Smart Word Suggestions” (SWS) task and benchmark. Unlike other works, SWS emphasizes end-to-end evaluation and presents a more realistic writing assistance scenario. This task involves identifying words or phrases that require improvement and providing substitution suggestions. The benchmark includes human-labeled data for testing, a large distantly supervised dataset for training, and the framework for evaluation. The test data includes 1,000 sentences written by English learners, accompanied by over 16,000 substitution suggestions annotated by 10 native speakers. The training dataset comprises over 3.7 million sentences and 12.7 million suggestions generated through rules. Our experiments with seven baselines demonstrate that SWS is a challenging task. Based on experimental analysis, we suggest potential directions for future research on SWS. The dataset and related codes will be available for research purposes.
pdf
bib
abs
A Multi-Task Dataset for Assessing Discourse Coherence in Chinese Essays: Structure, Theme, and Logic Analysis
Hongyi Wu
|
Xinshu Shen
|
Man Lan
|
Shaoguang Mao
|
Xiaopeng Bai
|
Yuanbin Wu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
This paper introduces the
Chinese
Essay
Discourse
Coherence
Corpus (
CEDCC), a multi-task dataset for assessing discourse coherence. Existing research tends to focus on isolated dimensions of discourse coherence, a gap which the CEDCC addresses by integrating coherence grading, topical continuity, and discourse relations. This approach, alongside detailed annotations, captures the subtleties of real-world texts and stimulates progress in Chinese discourse coherence analysis. Our contributions include the development of the CEDCC, the establishment of baselines for further research, and the demonstration of the impact of coherence on discourse relation recognition and automated essay scoring. The dataset and related codes is available at
https://github.com/cubenlp/CEDCC_corpus.
pdf
bib
abs
Overview of CCL23-Eval Task 8: Chinese Essay Fluency Evaluation (CEFE) Task
Xinshu Shen
|
Hongyi Wu
|
Xiaopeng Bai
|
Yuanbin Wu
|
Aimin Zhou
|
Shaoguang Mao
|
Tao Ge
|
Yan Xia
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
“This paper provides a comprehensive review of the CCL23-Eval Task 8, i.e., Chinese EssayFluency Evaluation (CEFE). The primary aim of this task is to systematically identify the typesof grammatical fine-grained errors that affect the readability and coherence of essays writtenby Chinese primary and secondary school students, and then to suggest suitable corrections toenhance the fluidity of their written expression. This task consists of three distinct tracks: (1)Coarse-grained and fine-grained error identification; (2) Character-level error identification andcorrection; (3) Error sentence rewriting. In the end, we received 44 completed registration forms,leading to a total of 130 submissions from 11 dedicated participating teams. We present theresults of all participants and our analysis of these results. Both the dataset and evaluation toolused in this task are available1.”