The RST Spanish-Chinese Treebank

Shuyuan Cao, Iria da Cunha, Mikel Iruskieta


Abstract
Discourse analysis is necessary for different tasks of Natural Language Processing (NLP). As two of the most spoken languages in the world, discourse analysis between Spanish and Chinese is important for NLP research. This paper aims to present the first open Spanish-Chinese parallel corpus annotated with discourse information, whose theoretical framework is based on the Rhetorical Structure Theory (RST). We have evaluated and harmonized each annotation part to obtain a high annotated-quality corpus. The corpus is already available to the public.
Anthology ID:
W18-4917
Volume:
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Agata Savary, Carlos Ramisch, Jena D. Hwang, Nathan Schneider, Melanie Andresen, Sameer Pradhan, Miriam R. L. Petruck
Venues:
LAW | MWE
SIGs:
SIGLEX | SIGANN
Publisher:
Association for Computational Linguistics
Note:
Pages:
156–166
Language:
URL:
https://aclanthology.org/W18-4917
DOI:
Bibkey:
Cite (ACL):
Shuyuan Cao, Iria da Cunha, and Mikel Iruskieta. 2018. The RST Spanish-Chinese Treebank. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pages 156–166, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
The RST Spanish-Chinese Treebank (Cao et al., LAW-MWE 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-4917.pdf