Choosing between Long and Short Word Forms in Mandarin

Lin Li, Kees van Deemter, Denis Paperno, Jingyu Fan


Abstract
Between 80% and 90% of all Chinese words have long and short form such as 老虎/虎 (lao-hu/hu , tiger) (Duanmu:2013). Consequently, the choice between long and short forms is a key problem for lexical choice across NLP and NLG. Following an earlier work on abbreviations in English (Mahowald et al, 2013), we bring a probabilistic perspective to these questions, using both a behavioral and a corpus-based approach. We hypothesized that there is a higher probability of choosing short form in supportive context than in neutral context in Mandarin. Consistent with our prediction, our findings revealed that predictability of contexts makes effect on speakers’ long and short form choice.
Anthology ID:
W19-8605
Volume:
Proceedings of the 12th International Conference on Natural Language Generation
Month:
October–November
Year:
2019
Address:
Tokyo, Japan
Editors:
Kees van Deemter, Chenghua Lin, Hiroya Takamura
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
34–39
Language:
URL:
https://aclanthology.org/W19-8605
DOI:
10.18653/v1/W19-8605
Bibkey:
Cite (ACL):
Lin Li, Kees van Deemter, Denis Paperno, and Jingyu Fan. 2019. Choosing between Long and Short Word Forms in Mandarin. In Proceedings of the 12th International Conference on Natural Language Generation, pages 34–39, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):
Choosing between Long and Short Word Forms in Mandarin (Li et al., INLG 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-8605.pdf