Modeling Noise in Paraphrase Detection

Teemu Vahtola, Eetu Sjöblom, Jörg Tiedemann, Mathias Creutz


Abstract
Noisy labels in training data present a challenging issue in classification tasks, misleading a model towards incorrect decisions during training. In this paper, we propose the use of a linear noise model to augment pre-trained language models to account for label noise in fine-tuning. We test our approach in a paraphrase detection task with various levels of noise and five different languages. Our experiments demonstrate the effectiveness of the additional noise model in making the training procedures more robust and stable. Furthermore, we show that this model can be applied without further knowledge about annotation confidence and reliability of individual training examples and we analyse our results in light of data selection and sampling strategies.
Anthology ID:
2022.lrec-1.461
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4324–4332
Language:
URL:
https://aclanthology.org/2022.lrec-1.461
DOI:
Bibkey:
Cite (ACL):
Teemu Vahtola, Eetu Sjöblom, Jörg Tiedemann, and Mathias Creutz. 2022. Modeling Noise in Paraphrase Detection. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4324–4332, Marseille, France. European Language Resources Association.
Cite (Informal):
Modeling Noise in Paraphrase Detection (Vahtola et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.461.pdf
Data
Opusparcus