A Method for Human-Interpretable Paraphrasticality Prediction

Maria Moritz, Johannes Hellrich, Sven Büchel


Abstract
The detection of reused text is important in a wide range of disciplines. However, even as research in the field of plagiarism detection is constantly improving, heavily modified or paraphrased text is still challenging for current methodologies. For historical texts, these problems are even more severe, since text sources were often subject to stronger and more frequent modifications. Despite the need for tools to automate text criticism, e.g., tracing modifications in historical text, algorithmic support is still limited. While current techniques can tell if and how frequently a text has been modified, very little work has been done on determining the degree and kind of paraphrastic modification—despite such information being of substantial interest to scholars. We present a human-interpretable, feature-based method to measure paraphrastic modification. Evaluating our technique on three data sets, we find that our approach performs competitive to text similarity scores borrowed from machine translation evaluation, being much harder to interpret.
Anthology ID:
W18-4513
Volume:
Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico
Editors:
Beatrice Alex, Stefania Degaetano-Ortlieb, Anna Feldman, Anna Kazantseva, Nils Reiter, Stan Szpakowicz
Venue:
LaTeCH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
113–118
Language:
URL:
https://aclanthology.org/W18-4513/
DOI:
Bibkey:
Cite (ACL):
Maria Moritz, Johannes Hellrich, and Sven Büchel. 2018. A Method for Human-Interpretable Paraphrasticality Prediction. In Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 113–118, Santa Fe, New Mexico. Association for Computational Linguistics.
Cite (Informal):
A Method for Human-Interpretable Paraphrasticality Prediction (Moritz et al., LaTeCH 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-4513.pdf