Where Frameworks (Dis)agree: A Study of Discourse Segmentation

Maciej Ogrodniczuk; Anna Latusek; Karolina Saputa; Alina Wróblewska; Daniel Ziembicki; Bartosz Żuk; Martyna Lewandowska; Adam Okrasiński; Paulina Rosalska; Anna Śliwicka; Aleksandra Tomaszewska; Sebastian Żurowski

doi:10.18653/v1/2025.codi-1.16

Where Frameworks (Dis)agree: A Study of Discourse Segmentation

Maciej Ogrodniczuk, Anna Latusek, Karolina Saputa, Alina Wróblewska, Daniel Ziembicki, Bartosz Żuk, Martyna Lewandowska, Adam Okrasiński, Paulina Rosalska, Anna Śliwicka, Aleksandra Tomaszewska, Sebastian Żurowski

Abstract

This study addresses the fundamental task of discourse unit detection – the critical initial step in discourse parsing. We analyze how various discourse frameworks conceptualize and structure discourse units, with a focus on their underlying taxonomies and theoretical assumptions. While approaches to discourse segmentation vary considerably, the extent to which these conceptual divergences influence practical implementations remains insufficiently studied. To address this gap, we investigate similarities and differences in segmentation across several English datasets, segmented and annotated according to distinct discourse frameworks, using a simple, rule-based heuristics. We evaluate the effectiveness of rules with respect to gold-standard segmentation, while also checking variability and cross-framework generalizability. Additionally, we conduct a manual comparison of a sample of rule-based segmentation outputs against benchmark segmentation, identifying points of convergence and divergence.Our findings indicate that discourse frameworks align strongly at the level of segmentation: particular clauses consistently serve as the primary boundaries of discourse units. Discrepancies arise mainly in the treatment of other structures, such as adpositional phrases, appositions, interjections, and parenthesised text segments, which are inconsistently marked as separate discourse units across formalisms.

Anthology ID:: 2025.codi-1.16
Volume:: Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Michael Strube, Chloe Braud, Christian Hardmeier, Junyi Jessy Li, Sharid Loaiciga, Amir Zeldes, Chuyuan Li
Venues:: CODI | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 182–196
Language:
URL:: https://aclanthology.org/2025.codi-1.16/
DOI:: 10.18653/v1/2025.codi-1.16
Bibkey:
Cite (ACL):: Maciej Ogrodniczuk, Anna Latusek, Karolina Saputa, Alina Wróblewska, Daniel Ziembicki, Bartosz Żuk, Martyna Lewandowska, Adam Okrasiński, Paulina Rosalska, Anna Śliwicka, Aleksandra Tomaszewska, and Sebastian Żurowski. 2025. Where Frameworks (Dis)agree: A Study of Discourse Segmentation. In Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025), pages 182–196, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Where Frameworks (Dis)agree: A Study of Discourse Segmentation (Ogrodniczuk et al., CODI 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.codi-1.16.pdf

PDF Cite Search Fix data