Bridging Discourse Treebanks with a Unified Rhetorical Structure Parser

Elena Chistova


Abstract
We introduce UniRST, the first unified RST-style discourse parser capable of handling 18 treebanks in 11 languages without modifying their relation inventories. To overcome inventory incompatibilities, we propose and evaluate two training strategies: Multi-Head, which assigns separate relation classification layer per inventory, and Masked-Union, which enables shared parameter training through selective label masking. We first benchmark mono-treebank parsing with a simple yet effective augmentation technique for low-resource settings. We then train a unified model and show that (1) the parameter efficient Masked-Union approach is also the strongest, and (2) UniRST outperforms 16 of 18 mono‐treebank baselines, demonstrating the advantages of a single-model, multilingual end-to-end discourse parsing across diverse resources.
Anthology ID:
2025.codi-1.17
Volume:
Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Michael Strube, Chloe Braud, Christian Hardmeier, Junyi Jessy Li, Sharid Loaiciga, Amir Zeldes, Chuyuan Li
Venues:
CODI | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
197–208
Language:
URL:
https://aclanthology.org/2025.codi-1.17/
DOI:
Bibkey:
Cite (ACL):
Elena Chistova. 2025. Bridging Discourse Treebanks with a Unified Rhetorical Structure Parser. In Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025), pages 197–208, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Bridging Discourse Treebanks with a Unified Rhetorical Structure Parser (Chistova, CODI 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.codi-1.17.pdf