2024
pdf
bib
abs
Are U a Joke Master? Pun Generation via Multi-Stage Curriculum Learning towards a Humor LLM
Yang Chen
|
Chong Yang
|
Tu Hu
|
Xinhao Chen
|
Man Lan
|
Li Cai
|
Xinlin Zhuang
|
Xuan Lin
|
Xin Lu
|
Aimin Zhou
Findings of the Association for Computational Linguistics: ACL 2024
Although large language models (LLMs) acquire extensive world knowledge and some reasoning abilities, their proficiency in generating humorous sentences remains a challenge. Previous research has demonstrated that the humor generation capabilities of ChatGPT are confined to producing merely 25 unique jokes. In this work, we concentrate on endowing LLMs with the ability of generating puns, a particular category of humor by preference learning method. We propose a multi-stage curriculum preference learning framework to optimize both pun structure preferences and humor preferences. Specifically, we improve the Direct Preference Optimization (DPO) algorithm to address the challenge of multi-objective alignment problem. Besides, to facilitate further advancement in this field, we collect a Chinese Pun (ChinesePun) dataset, containing 2.1k puns and corresponding annotations. Experimental results on both Chinese and English benchmark datasets demonstrate that our method significantly outperforms all the baseline models.
pdf
bib
abs
TOREE: Evaluating Topic Relevance of Student Essays for Chinese Primary and Middle School Education
Xinlin Zhuang
|
Hongyi Wu
|
Xinshu Shen
|
Peimin Yu
|
Gaowei Yi
|
Xinhao Chen
|
Tu Hu
|
Yang Chen
|
Yupei Ren
|
Yadong Zhang
|
Youqi Song
|
Binxuan Liu
|
Man Lan
Findings of the Association for Computational Linguistics: ACL 2024
Topic relevance of an essay demands that the composition adheres to a clear theme and aligns well with the essay prompt requirements, a critical aspect of essay quality evaluation. However, existing research of Automatic Essay Scoring (AES) for Chinese essays has overlooked topic relevance and lacks detailed feedback, while Automatic Essay Comment Generation (AECG) faces much complexity and difficulty. Additionally, current Large Language Models, including GPT-4, often make incorrect judgments and provide overly impractical feedback when evaluating topic relevance. This paper introduces TOREE (Topic Relevance Evaluation), a comprehensive dataset developed to assess topic relevance in Chinese primary and middle school students’ essays, which is beneficial for AES, AECG and other applications. Moreover, our proposed two-step method utilizes TOREE through a combination of Supervised Fine-tuning and Preference Learning. Experimental results demonstrate that TOREE is of high quality, and our method significantly enhances models’ performance on two designed tasks for topic relevance evaluation, improving both automatic and human evaluations across four diverse LLMs.
pdf
bib
abs
CEAMC: Corpus and Empirical Study of Argument Analysis in Education via LLMs
Yupei Ren
|
Hongyi Wu
|
Zhaoguang Long
|
Shangqing Zhao
|
Xinyi Zhou
|
Zheqin Yin
|
Xinlin Zhuang
|
Xiaopeng Bai
|
Man Lan
Findings of the Association for Computational Linguistics: EMNLP 2024
This paper introduces the Chinese Essay Argument Mining Corpus (CEAMC), a manually annotated dataset designed for argument component classification on multiple levels of granularity. Existing argument component types in education remain simplistic and isolated, failing to encapsulate the complete argument information. Originating from authentic examination settings, CEAMC categorizes argument components into 4 coarse-grained and 10 fine-grained delineations, surpassing previous simple representations to capture the subtle nuances of argumentation in the real world, thus meeting the needs of complex and diverse argumentative scenarios. Our contributions include the development of CEAMC, the establishment of baselines for further research, and a thorough exploration of the performance of Large Language Models (LLMs) on CEAMC. The results indicate that our CEAMC can serve as a challenging benchmark for the development of argument analysis in education.
pdf
bib
abs
Towards Explainable Chinese Native Learner Essay Fluency Assessment: Dataset, Tasks, and Method
Xinshu Shen
|
Hongyi Wu
|
Yadong Zhang
|
Man Lan
|
Xiaopeng Bai
|
Shaoguang Mao
|
Yuanbin Wu
|
Xinlin Zhuang
|
Li Cai
Findings of the Association for Computational Linguistics: EMNLP 2024
Grammatical Error Correction (GEC) is a crucial technique in Automated Essay Scoring (AES) for evaluating the fluency of essays. However, in Chinese, existing GEC datasets often fail to consider the importance of specific grammatical error types within compositional scenarios, lack research on data collected from native Chinese speakers, and largely overlook cross-sentence grammatical errors. Furthermore, the measurement of the overall fluency of an essay is often overlooked. To address these issues, we present CEFA (Chinese Essay Fluency Assessment), an extensive corpus that is derived from essays authored by native Chinese-speaking primary and secondary students and encapsulates essay fluency scores along with both coarse and fine-grained grammatical error types and corrections. Experiments employing various benchmark models on CEFA substantiate the challenge of our dataset. Our findings further highlight the significance of fine-grained annotations in fluency assessment and the mutually beneficial relationship between error types and corrections