EusDisParser: improving an under-resourced discourse parser with cross-lingual data

Mikel Iruskieta, Chloé Braud


Abstract
Development of discourse parsers to annotate the relational discourse structure of a text is crucial for many downstream tasks. However, most of the existing work focuses on English, assuming a quite large dataset. Discourse data have been annotated for Basque, but training a system on these data is challenging since the corpus is very small. In this paper, we create the first demonstrator based on RST for Basque, and we investigate the use of data in another language to improve the performance of a Basque discourse parser. More precisely, we build a monolingual system using the small set of data available and investigate the use of multilingual word embeddings to train a system for Basque using data annotated for another language. We found that our approach to building a system limited to the small set of data available for Basque allowed us to get an improvement over previous approaches making use of many data annotated in other languages. At best, we get 34.78 in F1 for the full discourse structure. More data annotation is necessary in order to improve the results obtained with these techniques. We also describe which relations match with the gold standard, in order to understand these results.
Anthology ID:
W19-2709
Volume:
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019
Month:
June
Year:
2019
Address:
Minneapolis, MN
Editors:
Amir Zeldes, Debopam Das, Erick Maziero Galani, Juliano Desiderato Antonio, Mikel Iruskieta
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–71
Language:
URL:
https://aclanthology.org/W19-2709
DOI:
10.18653/v1/W19-2709
Bibkey:
Cite (ACL):
Mikel Iruskieta and Chloé Braud. 2019. EusDisParser: improving an under-resourced discourse parser with cross-lingual data. In Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, pages 62–71, Minneapolis, MN. Association for Computational Linguistics.
Cite (Informal):
EusDisParser: improving an under-resourced discourse parser with cross-lingual data (Iruskieta & Braud, NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-2709.pdf