Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations

Daniela Brook Weiss, Paul Roit, Ori Ernst, Ido Dagan


Abstract
NLP models that process multiple texts often struggle in recognizing corresponding and salient information that is often differently phrased, and consolidating the redundancies across texts. To facilitate research of such challenges, the sentence fusion task was proposed, yet previous datasets for this task were very limited in their size and scope. In this paper, we revisit and substantially extend previous dataset creation efforts. With careful modifications, relabeling, and employing complementing data sources, we were able to more than triple the size of a notable earlier dataset.Moreover, we show that our extended version uses more representative texts for multi-document tasks and provides a more diverse training set, which substantially improves model performance.
Anthology ID:
2022.naacl-main.135
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1854–1860
Language:
URL:
https://aclanthology.org/2022.naacl-main.135
DOI:
10.18653/v1/2022.naacl-main.135
Bibkey:
Cite (ACL):
Daniela Brook Weiss, Paul Roit, Ori Ernst, and Ido Dagan. 2022. Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1854–1860, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations (Brook Weiss et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.135.pdf
Code
 danielabweiss/extending-sentence-fusion-resources