CLULab-UofA at SemEval-2024 Task 8: Detecting Machine-Generated Text Using Triplet-Loss-Trained Text Similarity and Text Classification

Mohammadhossein Rezaei; Yeaeun Kwon; Reza Sanayei; Abhyuday Singh; Steven Bethard

doi:10.18653/v1/2024.semeval-1.215

CLULab-UofA at SemEval-2024 Task 8: Detecting Machine-Generated Text Using Triplet-Loss-Trained Text Similarity and Text Classification

Mohammadhossein Rezaei, Yeaeun Kwon, Reza Sanayei, Abhyuday Singh, Steven Bethard

Abstract

Detecting machine-generated text is a critical task in the era of large language models. In this paper, we present our systems for SemEval-2024 Task 8, which focuses on multi-class classification to discern between human-written and maching-generated texts by five state-of-the-art large language models. We propose three different systems: unsupervised text similarity, triplet-loss-trained text similarity, and text classification. We show that the triplet-loss trained text similarity system outperforms the other systems, achieving 80% accuracy on the test set and surpassing the baseline model for this subtask. Additionally, our text classification system, which takes into account sentence paraphrases generated by the candidate models, also outperforms the unsupervised text similarity system, achieving 74% accuracy.

Anthology ID:: 2024.semeval-1.215
Volume:: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1498–1504
Language:
URL:: https://aclanthology.org/2024.semeval-1.215/
DOI:: 10.18653/v1/2024.semeval-1.215
Bibkey:
Cite (ACL):: Mohammadhossein Rezaei, Yeaeun Kwon, Reza Sanayei, Abhyuday Singh, and Steven Bethard. 2024. CLULab-UofA at SemEval-2024 Task 8: Detecting Machine-Generated Text Using Triplet-Loss-Trained Text Similarity and Text Classification. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1498–1504, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: CLULab-UofA at SemEval-2024 Task 8: Detecting Machine-Generated Text Using Triplet-Loss-Trained Text Similarity and Text Classification (Rezaei et al., SemEval 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.semeval-1.215.pdf
Supplementarymaterial:: 2024.semeval-1.215.SupplementaryMaterial.txt
Video:: https://aclanthology.org/2024.semeval-1.215.mp4

PDF Cite Search Supplementarymaterial Video Fix data