Bilingual Rhetorical Structure Parsing with Large Parallel Annotations

Elena Chistova


Abstract
Discourse parsing is a crucial task in natural language processing that aims to reveal the higher-level relations in a text. Despite growing interest in cross-lingual discourse parsing, challenges persist due to limited parallel data and inconsistencies in the Rhetorical Structure Theory (RST) application across languages and corpora. To address this, we introduce a parallel Russian annotation for the large and diverse English GUM RST corpus. Leveraging recent advances, our end-to-end RST parser achieves state-of-the-art results on both English and Russian corpora. It demonstrates effectiveness in both monolingual and bilingual settings, successfully transferring even with limited second-language annotation. To the best of our knowledge, this work is the first to evaluate the potential of cross-lingual end-to-end RST parsing on a manually annotated parallel corpus.
Anthology ID:
2024.findings-acl.577
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9689–9706
Language:
URL:
https://aclanthology.org/2024.findings-acl.577
DOI:
Bibkey:
Cite (ACL):
Elena Chistova. 2024. Bilingual Rhetorical Structure Parsing with Large Parallel Annotations. In Findings of the Association for Computational Linguistics ACL 2024, pages 9689–9706, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Bilingual Rhetorical Structure Parsing with Large Parallel Annotations (Chistova, Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.577.pdf