Synthetic Data Augmentation for Cross-domain Implicit Discourse Relation Recognition

Frances Yung; Varsha Suresh; Zaynab Reza; Mansoor Ahmad; Vera Demberg

Synthetic Data Augmentation for Cross-domain Implicit Discourse Relation Recognition

Frances Yung, Varsha Suresh, Zaynab Reza, Mansoor Ahmad, Vera Demberg

Abstract

Implicit discourse relation recognition (IDRR) – the task of identifying the implicit coherence relation between two text spans – requires deep semantic understanding. Recent studies have shown that zero-/few-shot approaches significantly lag behind supervised models. However, LLMs may be useful for synthetic data augmentation, where LLMs generate a second argument following a specified coherence relation. We applied this approach in a cross-domain setting, generating discourse continuations using unlabelled target-domain data to adapt a base model which was trained on source-domain labelled data. Evaluations conducted on a large-scale test set revealed that different variations of the approach did not result in any significant improvements. We conclude that LLMs often fail to generate useful samples for IDRR, and emphasize the importance of considering both statistical significance and comparability when evaluating IDRR models.

Anthology ID:: 2025.sigdial-1.13
Volume:: Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:: August
Year:: 2025
Address:: Avignon, France
Editors:: Frédéric Béchet, Fabrice Lefèvre, Nicholas Asher, Seokhwan Kim, Teva Merlin
Venue:: SIGDIAL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 172–182
Language:
URL:: https://aclanthology.org/2025.sigdial-1.13/
DOI:
Bibkey:
Cite (ACL):: Frances Yung, Varsha Suresh, Zaynab Reza, Mansoor Ahmad, and Vera Demberg. 2025. Synthetic Data Augmentation for Cross-domain Implicit Discourse Relation Recognition. In Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 172–182, Avignon, France. Association for Computational Linguistics.
Cite (Informal):: Synthetic Data Augmentation for Cross-domain Implicit Discourse Relation Recognition (Yung et al., SIGDIAL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.sigdial-1.13.pdf

PDF Cite Search Fix data