Character Set Construction for Chinese Language Learning

Chak Yan Yeung, John Lee


Abstract
To promote efficient learning of Chinese characters, pedagogical materials may present not only a single character, but a set of characters that are related in meaning and in written form. This paper investigates automatic construction of these character sets. The proposed model represents a character as averaged word vectors of common words containing the character. It then identifies sets of characters with high semantic similarity through clustering. Human evaluation shows that this representation outperforms direct use of character embeddings, and that the resulting character sets capture distinct semantic ranges.
Anthology ID:
2021.bea-1.6
Volume:
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications
Month:
April
Year:
2021
Address:
Online
Editors:
Jill Burstein, Andrea Horbach, Ekaterina Kochmar, Ronja Laarmann-Quante, Claudia Leacock, Nitin Madnani, Ildikó Pilán, Helen Yannakoudakis, Torsten Zesch
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
59–63
Language:
URL:
https://aclanthology.org/2021.bea-1.6
DOI:
Bibkey:
Cite (ACL):
Chak Yan Yeung and John Lee. 2021. Character Set Construction for Chinese Language Learning. In Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, pages 59–63, Online. Association for Computational Linguistics.
Cite (Informal):
Character Set Construction for Chinese Language Learning (Yeung & Lee, BEA 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.bea-1.6.pdf