Deep Investigation of Cross-Language Plagiarism Detection Methods

Jérémy Ferrero, Laurent Besacier, Didier Schwab, Frédéric Agnès


Abstract
This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts). We investigate cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to draw robust conclusions on the best methods while deeply analyzing correlations across document styles and languages.
Anthology ID:
W17-2502
Volume:
Proceedings of the 10th Workshop on Building and Using Comparable Corpora
Month:
August
Year:
2017
Address:
Vancouver, Canada
Editors:
Serge Sharoff, Pierre Zweigenbaum, Reinhard Rapp
Venue:
BUCC
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6–15
Language:
URL:
https://aclanthology.org/W17-2502
DOI:
10.18653/v1/W17-2502
Bibkey:
Cite (ACL):
Jérémy Ferrero, Laurent Besacier, Didier Schwab, and Frédéric Agnès. 2017. Deep Investigation of Cross-Language Plagiarism Detection Methods. In Proceedings of the 10th Workshop on Building and Using Comparable Corpora, pages 6–15, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Deep Investigation of Cross-Language Plagiarism Detection Methods (Ferrero et al., BUCC 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-2502.pdf
Presentation:
 W17-2502.Presentation.pdf
Code
 FerreroJeremy/Cross-Language-Dataset