CTAP for Chinese:A Linguistic Complexity Feature Automatic Calculation Platform

Yue Cui, Junhui Zhu, Liner Yang, Xuezhi Fang, Xiaobin Chen, Yujie Wang, Erhong Yang


Abstract
The construct of linguistic complexity has been widely used in language learning research. Several text analysis tools have been created to automatically analyze linguistic complexity. However, the indexes supported by several existing Chinese text analysis tools are limited and different because of different research purposes. CTAP is an open-source linguistic complexity measurement extraction tool, which prompts any research purposes. Although it was originally developed for English, the Unstructured Information Management (UIMA) framework it used allows the integration of other languages. In this study, we integrated the Chinese component into CTAP, describing the index sets it incorporated and comparing it with three linguistic complexity tools for Chinese. The index set includes four levels of 196 linguistic complexity indexes: character level, word level, sentence level, and discourse level. So far, CTAP has implemented automatic calculation of complexity characteristics for four languages, aiming to help linguists without NLP background study language complexity.
Anthology ID:
2022.lrec-1.592
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5525–5538
Language:
URL:
https://aclanthology.org/2022.lrec-1.592
DOI:
Bibkey:
Cite (ACL):
Yue Cui, Junhui Zhu, Liner Yang, Xuezhi Fang, Xiaobin Chen, Yujie Wang, and Erhong Yang. 2022. CTAP for Chinese:A Linguistic Complexity Feature Automatic Calculation Platform. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5525–5538, Marseille, France. European Language Resources Association.
Cite (Informal):
CTAP for Chinese:A Linguistic Complexity Feature Automatic Calculation Platform (Cui et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.592.pdf