Xiaopeng Bai


2024

pdf bib
CERD: A Comprehensive Chinese Rhetoric Dataset for Rhetorical Understanding and Generation in Essays
Nuowei Liu | Xinhao Chen | Hongyi Wu | Changzhi Sun | Man Lan | Yuanbin Wu | Xiaopeng Bai | Shaoguang Mao | Yan Xia
Findings of the Association for Computational Linguistics: EMNLP 2024

pdf bib
CEAMC: Corpus and Empirical Study of Argument Analysis in Education via LLMs
Yupei Ren | Hongyi Wu | Zhaoguang Long | Shangqing Zhao | Xinyi Zhou | Zheqin Yin | Xinlin Zhuang | Xiaopeng Bai | Man Lan
Findings of the Association for Computational Linguistics: EMNLP 2024

This paper introduces the Chinese Essay Argument Mining Corpus (CEAMC), a manually annotated dataset designed for argument component classification on multiple levels of granularity. Existing argument component types in education remain simplistic and isolated, failing to encapsulate the complete argument information. Originating from authentic examination settings, CEAMC categorizes argument components into 4 coarse-grained and 10 fine-grained delineations, surpassing previous simple representations to capture the subtle nuances of argumentation in the real world, thus meeting the needs of complex and diverse argumentative scenarios. Our contributions include the development of CEAMC, the establishment of baselines for further research, and a thorough exploration of the performance of Large Language Models (LLMs) on CEAMC. The results indicate that our CEAMC can serve as a challenging benchmark for the development of argument analysis in education.

pdf bib
Towards Explainable Chinese Native Learner Essay Fluency Assessment: Dataset, Tasks, and Method
Xinshu Shen | Hongyi Wu | Yadong Zhang | Man Lan | Xiaopeng Bai | Shaoguang Mao | Yuanbin Wu | Xinlin Zhuang | Li Cai
Findings of the Association for Computational Linguistics: EMNLP 2024

Grammatical Error Correction (GEC) is a crucial technique in Automated Essay Scoring (AES) for evaluating the fluency of essays. However, in Chinese, existing GEC datasets often fail to consider the importance of specific grammatical error types within compositional scenarios, lack research on data collected from native Chinese speakers, and largely overlook cross-sentence grammatical errors. Furthermore, the measurement of the overall fluency of an essay is often overlooked. To address these issues, we present CEFA (Chinese Essay Fluency Assessment), an extensive corpus that is derived from essays authored by native Chinese-speaking primary and secondary students and encapsulates essay fluency scores along with both coarse and fine-grained grammatical error types and corrections. Experiments employing various benchmark models on CEFA substantiate the challenge of our dataset. Our findings further highlight the significance of fine-grained annotations in fluency assessment and the mutually beneficial relationship between error types and corrections

2023

pdf bib
A Multi-Task Dataset for Assessing Discourse Coherence in Chinese Essays: Structure, Theme, and Logic Analysis
Hongyi Wu | Xinshu Shen | Man Lan | Shaoguang Mao | Xiaopeng Bai | Yuanbin Wu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

This paper introduces the Chinese Essay Discourse Coherence Corpus (CEDCC), a multi-task dataset for assessing discourse coherence. Existing research tends to focus on isolated dimensions of discourse coherence, a gap which the CEDCC addresses by integrating coherence grading, topical continuity, and discourse relations. This approach, alongside detailed annotations, captures the subtleties of real-world texts and stimulates progress in Chinese discourse coherence analysis. Our contributions include the development of the CEDCC, the establishment of baselines for further research, and the demonstration of the impact of coherence on discourse relation recognition and automated essay scoring. The dataset and related codes is available at https://github.com/cubenlp/CEDCC_corpus.

pdf bib
Overview of CCL23-Eval Task 8: Chinese Essay Fluency Evaluation (CEFE) Task
Xinshu Shen | Hongyi Wu | Xiaopeng Bai | Yuanbin Wu | Aimin Zhou | Shaoguang Mao | Tao Ge | Yan Xia
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“This paper provides a comprehensive review of the CCL23-Eval Task 8, i.e., Chinese EssayFluency Evaluation (CEFE). The primary aim of this task is to systematically identify the typesof grammatical fine-grained errors that affect the readability and coherence of essays writtenby Chinese primary and secondary school students, and then to suggest suitable corrections toenhance the fluidity of their written expression. This task consists of three distinct tracks: (1)Coarse-grained and fine-grained error identification; (2) Character-level error identification andcorrection; (3) Error sentence rewriting. In the end, we received 44 completed registration forms,leading to a total of 130 submissions from 11 dedicated participating teams. We present theresults of all participants and our analysis of these results. Both the dataset and evaluation toolused in this task are available1.”

2017

pdf bib
N-gram Model for Chinese Grammatical Error Diagnosis
Jianbo Zhao | Hao Liu | Zuyi Bao | Xiaopeng Bai | Si Li | Zhiqing Lin
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)

Detection and correction of Chinese grammatical errors have been two of major challenges for Chinese automatic grammatical error diagnosis. This paper presents an N-gram model for automatic detection and correction of Chinese grammatical errors in NLPTEA 2017 task. The experiment results show that the proposed method is good at correction of Chinese grammatical errors.

2015

pdf bib
Chinese CogBank: Where to See the Cognitive Features of Chinese Words
Bin Li | Xiaopeng Bai | Siqi Yin | Jie Xu
Proceedings of the Third Workshop on Metaphor in NLP

2012

pdf bib
Building a Chinese Lexical Taxonomy
Xiaopeng Bai | Nianwen Xue
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

2010

pdf bib
Lexical Semantics-Syntactic Model for Defining and Subcategorizing Attribute Noun Class
Xiaopeng Bai | Hui Wang
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation