Task-Oriented Paraphrase Analytics

Marcel Gohsen, Matthias Hagen, Martin Potthast, Benno Stein


Abstract
Since paraphrasing is an ill-defined task, the term “paraphrasing” covers text transformation tasks with different characteristics. Consequently, existing paraphrasing studies have applied quite different (explicit and implicit) criteria as to when a pair of texts is to be considered a paraphrase, all of which amount to postulating a certain level of semantic or lexical similarity. In this paper, we conduct a literature review and propose a taxonomy to organize the 25 identified paraphrasing (sub-)tasks. Using classifiers trained to identify the tasks that a given paraphrasing instance fits, we find that the distributions of task-specific instances in the known paraphrase corpora vary substantially. This means that the use of these corpora, without the respective paraphrase conditions being clearly defined (which is the normal case), must lead to incomparable and misleading results.
Anthology ID:
2024.lrec-main.1360
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
15640–15654
Language:
URL:
https://aclanthology.org/2024.lrec-main.1360
DOI:
Bibkey:
Cite (ACL):
Marcel Gohsen, Matthias Hagen, Martin Potthast, and Benno Stein. 2024. Task-Oriented Paraphrase Analytics. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 15640–15654, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Task-Oriented Paraphrase Analytics (Gohsen et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1360.pdf