Peng Cao
2025
A Novel Chinese-Idiom Automatic Error Correction Method Based on the Hidden Markov Model
Rongbin Zhang
|
Anlu Gui
|
Peng Cao
|
Lingfeng Wu
|
Feng Huang
|
Jiahui Li
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Spelling errors in Chinese idioms frequently occur due to various types of misspellings and optical character recognition (OCR) errors in daily learning and usage. Achieving automatic error correction for Chinese idioms is one of the important natural language processing tasks, as it helps improve the quality of Chinese texts as well as language learning. Existing methods, such as edit distance and custom dictionary approaches, suffer from limited error correction capability, low computational efficiency, and weak flexibility. To address these limitations, this paper proposes a novel automatic error correction method for Chinese idioms based on the hidden Markov model (HMM). Specifically, the generation process of idiom spelling errors is modeled using an HMM, transforming the idiom correction problem into a matching task between erroneous idioms and legitimate idioms. By constructing a legitimate idiom table and a Chinese character confusion set, a prototype system for idiom correction was developed, and performance testing was completed. Experiment results demonstrate that the proposed model is simpler with fewer parameters and has lower computational complexity while exhibiting stronger error correction capability and parameter robustness as compared to existing methods. It can more flexibly correct diverse types of idiom errors, showing high potential application value.
2024
Reusing Transferable Weight Increments for Low-resource Style Generation
Chunzhen Jin
|
Eliot Huang
|
Heng Chang
|
Yaqi Wang
|
Peng Cao
|
Osmar Zaiane
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Text style transfer (TST) is crucial in natural language processing, aiming to endow text with a new style without altering its meaning. In real-world scenarios, not all styles have abundant resources. This work introduces TWIST (reusing Transferable Weight Increments for Style Text generation), a novel framework to mitigate data scarcity by utilizing style features in weight increments to transfer low-resource styles effectively. During target style learning, we derive knowledge via a specially designed weight pool and initialize the parameters for the unseen style. To enhance the effectiveness of merging, the target style weight increments are often merged from multiple source style weight increments through singular vectors. Considering the diversity of styles, we also designed a multi-key memory network that simultaneously focuses on task- and instance-level information to derive the most relevant weight increments. Results from multiple style transfer datasets show that TWIST demonstrates remarkable performance across different backbones, achieving particularly effective results in low-resource scenarios.