Lian Chen

2026

Ontology-oriented lexico-semantic modeling and neural classification of Chinese chéngyǔ: A culture-aware NLP approach
Lian Chen
Proceedings of the 4th Workshop on Cross-Cultural Considerations in NLP (C3NLP 2026)

This paper proposes a semi-automatic lexico-semantic modeling framework for Chinese chéngyǔ containing body-part and animal lexemes. The framework combines manual semantic annotation, lightweight RDF/OWL formalization and semantic classification in order to investigate whether lexical mediators such as 心 xīn “heart/mind”, 口 kǒu “mouth” or 马 mǎ “horse” are sufficient to predict idiomatic semantic interpretation. Based on 440 annotated chéngyǔ normalized into 18 semantic categories, we compare three classification approaches: a rule-based keyword baseline, character n-gram TF-IDF with logistic regression, and BERT-base-chinese. The results show that lexical mediators cannot be directly equated with semantic categories and that TF-IDF achieves the best overall performance, suggesting that lightweight character-level representations remain robust for very short idioms in low-resource settings. The study contributes an interpretable RDF/OWL-compatible resource for culture-aware modeling of Chinese idioms.

Co-authors

Venues

C3NLP1
WS1

Fix author