Improved Evaluation Framework for Complex Plagiarism Detection

Anton Belyy, Marina Dubova, Dmitry Nekrasov


Abstract
Plagiarism is a major issue in science and education. Complex plagiarism, such as plagiarism of ideas, is hard to detect, and therefore it is especially important to track improvement of methods correctly. In this paper, we study the performance of plagdet, the main measure for plagiarim detection, on manually paraphrased datasets (such as PAN Summary). We reveal its fallibility under certain conditions and propose an evaluation framework with normalization of inner terms, which is resilient to the dataset imbalance. We conclude with the experimental justification of the proposed measure. The implementation of the new framework is made publicly available as a Github repository.
Anthology ID:
P18-2026
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Iryna Gurevych, Yusuke Miyao
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
157–162
Language:
URL:
https://aclanthology.org/P18-2026
DOI:
10.18653/v1/P18-2026
Bibkey:
Cite (ACL):
Anton Belyy, Marina Dubova, and Dmitry Nekrasov. 2018. Improved Evaluation Framework for Complex Plagiarism Detection. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 157–162, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Improved Evaluation Framework for Complex Plagiarism Detection (Belyy et al., ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/P18-2026.pdf
Poster:
 P18-2026.Poster.pdf
Code
 AVBelyy/normplagdet