Xiaopeng Bai
2025
Towards Comprehensive Argument Analysis in Education: Dataset, Tasks, and Method
Yupei Ren | Xinyi Zhou | Ning Zhang | Shangqing Zhao | Man Lan | Xiaopeng Bai
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yupei Ren | Xinyi Zhou | Ning Zhang | Shangqing Zhao | Man Lan | Xiaopeng Bai
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Argument mining has garnered increasing attention over the years, with the recent advancement of Large Language Models (LLMs) further propelling this trend. However, current argument relations remain relatively simplistic and foundational, struggling to capture the full scope of argument information. To address this limitation, we propose a systematic framework comprising 14 fine-grained relation types from the perspectives of vertical argument relations and horizontal discourse relations, thereby capturing the intricate interplay between argument components for a thorough understanding of argument structure. On this basis, we conducted extensive experiments on three tasks: argument component prediction, relation prediction, and automated essay grading. Additionally, we explored the impact of writing quality on argument component prediction and relation prediction, as well as the connections between discourse relations and argumentative features. The findings highlight the importance of fine-grained argumentative annotations for argumentative writing assessment and encourage multi-dimensional argument analysis.
Overview of CCL25-Eval Task6: Chinese Essay Rhetoric Recognition Evaluation (CERRE)
Yujiang Lu | Nuowei Liu | Yupei Ren | Yicheng Zhu | Man Lan | Xiaopeng Bai | Mofan Xu | Qingyu Liao
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Yujiang Lu | Nuowei Liu | Yupei Ren | Yicheng Zhu | Man Lan | Xiaopeng Bai | Mofan Xu | Qingyu Liao
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"Literary grace in Chinese composition writing is a hallmark of linguistic sophistication, often realized through various rhetorical devices. The automatic identification and analysis of rhetorical devices in essays play a crucial role in educational NLP applications, particularly for assessing writing proficiency and facilitating pedagogical interventions. Although prior research has predominantly focused on coarse-grained recognition of limited rhetorical devices at sentence level, these approaches prove inadequate for handling complex rhetorical structures and emerging educational demands. In this paper, we present the CCL25-Eval Task6: Chinese EssayRhetoric Recognition Evaluation (CERRE), a novel framework comprising three distinct evaluation tracks at the document level: (1) Fine-grained Form-level Categories Recognition, (2)Fine-grained Content-level Categories Recognition, and (3) Rhetorical Component Extraction.The evaluation has attracted 29 registered participating teams, with 8 teams submitting valid system outputs. In particular, two participating systems demonstrated superior performance by exceeding the baseline metrics in complete evaluation criteria."
2024
Towards Explainable Chinese Native Learner Essay Fluency Assessment: Dataset, Tasks, and Method
Xinshu Shen | Hongyi Wu | Yadong Zhang | Man Lan | Xiaopeng Bai | Shaoguang Mao | Yuanbin Wu | Xinlin Zhuang | Li Cai
Findings of the Association for Computational Linguistics: EMNLP 2024
Xinshu Shen | Hongyi Wu | Yadong Zhang | Man Lan | Xiaopeng Bai | Shaoguang Mao | Yuanbin Wu | Xinlin Zhuang | Li Cai
Findings of the Association for Computational Linguistics: EMNLP 2024
Grammatical Error Correction (GEC) is a crucial technique in Automated Essay Scoring (AES) for evaluating the fluency of essays. However, in Chinese, existing GEC datasets often fail to consider the importance of specific grammatical error types within compositional scenarios, lack research on data collected from native Chinese speakers, and largely overlook cross-sentence grammatical errors. Furthermore, the measurement of the overall fluency of an essay is often overlooked. To address these issues, we present CEFA (Chinese Essay Fluency Assessment), an extensive corpus that is derived from essays authored by native Chinese-speaking primary and secondary students and encapsulates essay fluency scores along with both coarse and fine-grained grammatical error types and corrections. Experiments employing various benchmark models on CEFA substantiate the challenge of our dataset. Our findings further highlight the significance of fine-grained annotations in fluency assessment and the mutually beneficial relationship between error types and corrections
CERD: A Comprehensive Chinese Rhetoric Dataset for Rhetorical Understanding and Generation in Essays
Nuowei Liu | Xinhao Chen | Hongyi Wu | Changzhi Sun | Man Lan | Yuanbin Wu | Xiaopeng Bai | Shaoguang Mao | Yan Xia
Findings of the Association for Computational Linguistics: EMNLP 2024
Nuowei Liu | Xinhao Chen | Hongyi Wu | Changzhi Sun | Man Lan | Yuanbin Wu | Xiaopeng Bai | Shaoguang Mao | Yan Xia
Findings of the Association for Computational Linguistics: EMNLP 2024
Chinese Essay Rhetoric Recognition and Understanding (CERRU)
Nuowei Liu | Xinhao Chen | Yupei Ren | Man Lan | Xiaopeng Bai | Yuanbin Wu | Shaoguang Mao | Yan Xia
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
Nuowei Liu | Xinhao Chen | Yupei Ren | Man Lan | Xiaopeng Bai | Yuanbin Wu | Shaoguang Mao | Yan Xia
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
“Rhetoric is fundamental to the reading comprehension and writing skills of primary and middle school students. However, current work independently recognize single coarse-grained categories or fine-grained categories. In this paper, we propose the CCL24-Eval Task6: Chinese Essay Rhetoric Recognition and Understanding (CERRU), consisting of 3 tracks: (1) Fine-grained Form-level Categories Recognition, (2) Fine-grained Content-level Categories Recognition and (3) Rhetorical Component Extraction. A total of 32 teams registered to participate in CERRU and 9 teams submitted evaluation results, with 7 of these teams achieving an overall score that surpassed the baseline.”
CEAMC: Corpus and Empirical Study of Argument Analysis in Education via LLMs
Yupei Ren | Hongyi Wu | Zhaoguang Long | Shangqing Zhao | Xinyi Zhou | Zheqin Yin | Xinlin Zhuang | Xiaopeng Bai | Man Lan
Findings of the Association for Computational Linguistics: EMNLP 2024
Yupei Ren | Hongyi Wu | Zhaoguang Long | Shangqing Zhao | Xinyi Zhou | Zheqin Yin | Xinlin Zhuang | Xiaopeng Bai | Man Lan
Findings of the Association for Computational Linguistics: EMNLP 2024
This paper introduces the Chinese Essay Argument Mining Corpus (CEAMC), a manually annotated dataset designed for argument component classification on multiple levels of granularity. Existing argument component types in education remain simplistic and isolated, failing to encapsulate the complete argument information. Originating from authentic examination settings, CEAMC categorizes argument components into 4 coarse-grained and 10 fine-grained delineations, surpassing previous simple representations to capture the subtle nuances of argumentation in the real world, thus meeting the needs of complex and diverse argumentative scenarios. Our contributions include the development of CEAMC, the establishment of baselines for further research, and a thorough exploration of the performance of Large Language Models (LLMs) on CEAMC. The results indicate that our CEAMC can serve as a challenging benchmark for the development of argument analysis in education.
Chinese Essay Fluency Evaluation (CEFE) Task
Xinlin Zhuang | Xinshu Shen | Hongyi Wu | Man Lan | Xiaopeng Bai | Yuanbin Wu | Aimin Zhou | Shaoguang Mao
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
Xinlin Zhuang | Xinshu Shen | Hongyi Wu | Man Lan | Xiaopeng Bai | Yuanbin Wu | Aimin Zhou | Shaoguang Mao
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
“This paper presents a detailed review of Task 7 in the CCL24-Eval: the second Chinese Essay Fluency Evaluation (CEFE). The task aims to identify fine-grained grammatical errors that impair readability and coherence in essays authored by Chinese primary and secondary school students, evaluate the essays’ fluency levels, and recommend corrections to improve their written fluency. The evaluation comprises three tracks: (1) Coarse-grained and fine-grained error identification; (2) Error sentence rewriting; and (3) Essay Fluency Level Recognition. We garnered 29 completed registrations, resulting in 180 submissions from 10 dedicated teams. The paper discusses the submissions and analyzes the results from all participating teams.”
2023
Overview of CCL23-Eval Task 8: Chinese Essay Fluency Evaluation (CEFE) Task
Xinshu Shen | Hongyi Wu | Xiaopeng Bai | Yuanbin Wu | Aimin Zhou | Shaoguang Mao | Tao Ge | Yan Xia
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
Xinshu Shen | Hongyi Wu | Xiaopeng Bai | Yuanbin Wu | Aimin Zhou | Shaoguang Mao | Tao Ge | Yan Xia
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
“This paper provides a comprehensive review of the CCL23-Eval Task 8, i.e., Chinese EssayFluency Evaluation (CEFE). The primary aim of this task is to systematically identify the typesof grammatical fine-grained errors that affect the readability and coherence of essays writtenby Chinese primary and secondary school students, and then to suggest suitable corrections toenhance the fluidity of their written expression. This task consists of three distinct tracks: (1)Coarse-grained and fine-grained error identification; (2) Character-level error identification andcorrection; (3) Error sentence rewriting. In the end, we received 44 completed registration forms,leading to a total of 130 submissions from 11 dedicated participating teams. We present theresults of all participants and our analysis of these results. Both the dataset and evaluation toolused in this task are available1.”
A Multi-Task Dataset for Assessing Discourse Coherence in Chinese Essays: Structure, Theme, and Logic Analysis
Hongyi Wu | Xinshu Shen | Man Lan | Shaoguang Mao | Xiaopeng Bai | Yuanbin Wu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Hongyi Wu | Xinshu Shen | Man Lan | Shaoguang Mao | Xiaopeng Bai | Yuanbin Wu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
This paper introduces the Chinese Essay Discourse Coherence Corpus (CEDCC), a multi-task dataset for assessing discourse coherence. Existing research tends to focus on isolated dimensions of discourse coherence, a gap which the CEDCC addresses by integrating coherence grading, topical continuity, and discourse relations. This approach, alongside detailed annotations, captures the subtleties of real-world texts and stimulates progress in Chinese discourse coherence analysis. Our contributions include the development of the CEDCC, the establishment of baselines for further research, and the demonstration of the impact of coherence on discourse relation recognition and automated essay scoring. The dataset and related codes is available at https://github.com/cubenlp/CEDCC_corpus.
2017
N-gram Model for Chinese Grammatical Error Diagnosis
Jianbo Zhao | Hao Liu | Zuyi Bao | Xiaopeng Bai | Si Li | Zhiqing Lin
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)
Jianbo Zhao | Hao Liu | Zuyi Bao | Xiaopeng Bai | Si Li | Zhiqing Lin
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)
Detection and correction of Chinese grammatical errors have been two of major challenges for Chinese automatic grammatical error diagnosis. This paper presents an N-gram model for automatic detection and correction of Chinese grammatical errors in NLPTEA 2017 task. The experiment results show that the proposed method is good at correction of Chinese grammatical errors.
2015
Chinese CogBank: Where to See the Cognitive Features of Chinese Words
Bin Li | Xiaopeng Bai | Siqi Yin | Jie Xu
Proceedings of the Third Workshop on Metaphor in NLP
Bin Li | Xiaopeng Bai | Siqi Yin | Jie Xu
Proceedings of the Third Workshop on Metaphor in NLP
2012
Building a Chinese Lexical Taxonomy
Xiaopeng Bai | Nianwen Xue
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing
Xiaopeng Bai | Nianwen Xue
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing
2010
Search
Fix author
Co-authors
- Man Lan 8
- Shaoguang Mao 6
- Hongyi Wu 6
- Yuanbin Wu 6
- Yupei Ren 4
- Xinshu Shen 4
- Nuowei Liu 3
- Yan Xia 3
- Xinlin Zhuang 3
- Xinhao Chen 2
- Shangqing Zhao 2
- Aimin Zhou 2
- Xinyi Zhou 2
- Zuyi Bao 1
- Li Cai 1
- Tao Ge 1
- Si Li 1
- Bin Li 1
- Qingyu Liao 1
- Zhiqing Lin 1
- Hao Liu 1
- Zhaoguang Long 1
- Yujiang Lu 1
- Changzhi Sun 1
- Hui Wang 1
- Mofan Xu 1
- Jie Xu 1
- Nianwen Xue 1
- Zheqin Yin 1
- Siqi Yin 1
- Ning Zhang 1
- Yadong Zhang 1
- Jianbo Zhao 1
- Yicheng Zhu 1