Pseudo-label Data Construction Method and Syntax-enhanced Model for Chinese Semantic Error Recognition

Hongyan Wu, Nankai Lin, Shengyi Jiang, Lianxi Wang, Aimin Yang


Abstract
Chinese Semantic Error Recognition (CSER) has always been a weak link in Chinese language processing due to the complexity and obscureness of Chinese semantics. Existing research has gradually focused on leveraging pre-trained models to perform CSER. Although some researchers have attempted to integrate syntax information into the pre-trained language model, it requires training the models from scratch, which is time-consuming and laborious. Furthermore, despite the existence of datasets for CSER, the constrained size of these datasets impairs the performance of the models. Thus, in order to address the difficulty posed by a limited sample set and the need of annotating samples with semantic-level errors, we propose a Pseudo-label Data Construction method for CSER (PDC-CSER), generating pseudo-labels for augmented samples based on perplexity and model respectively, which overcomes the difficulty of constructing pseudo-label data containing semantic-level errors and ensures the quality of pseudo-labels. Moreover, we propose a CSER method with the Dependency Syntactic Attention mechanism (CSER-DSA) to explicitly infuse dependency syntactic information only in the fine-tuning stage, achieving robust performance, and simultaneously reducing substantial computing power and time cost. Results demonstrate that the pseudo-label technology PDC-CSER and the semantic error recognition method CSER-DSA surpass the existing models
Anthology ID:
2025.coling-main.361
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5391–5402
Language:
URL:
https://aclanthology.org/2025.coling-main.361/
DOI:
Bibkey:
Cite (ACL):
Hongyan Wu, Nankai Lin, Shengyi Jiang, Lianxi Wang, and Aimin Yang. 2025. Pseudo-label Data Construction Method and Syntax-enhanced Model for Chinese Semantic Error Recognition. In Proceedings of the 31st International Conference on Computational Linguistics, pages 5391–5402, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Pseudo-label Data Construction Method and Syntax-enhanced Model for Chinese Semantic Error Recognition (Wu et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.361.pdf