Like a Good Nearest Neighbor: Practical Content Moderation and Text Classification

Luke Bates, Iryna Gurevych


Abstract
Few-shot text classification systems have impressive capabilities but are infeasible to deploy and use reliably due to their dependence on prompting and billion-parameter language models. SetFit (Tunstall, 2022) is a recent, practical approach that fine-tunes a Sentence Transformer under a contrastive learning paradigm and achieves similar results to more unwieldy systems. Inexpensive text classification is important for addressing the problem of domain drift in all classification tasks, and especially in detecting harmful content, which plagues social media platforms. Here, we propose Like a Good Nearest Neighbor (LaGoNN), a modification to SetFit that introduces no learnable parameters but alters input text with information from its nearest neighbor, for example, the label and text, in the training data, making novel data appear similar to an instance on which the model was optimized. LaGoNN is effective at flagging undesirable content and text classification, and improves SetFit’s performance. To demonstrate LaGoNN’s value, we conduct a thorough study of text classification systems in the context of content moderation under four label distributions, and in general and multilingual classification settings.
Anthology ID:
2024.eacl-long.17
Volume:
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
276–297
Language:
URL:
https://aclanthology.org/2024.eacl-long.17
DOI:
Bibkey:
Cite (ACL):
Luke Bates and Iryna Gurevych. 2024. Like a Good Nearest Neighbor: Practical Content Moderation and Text Classification. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 276–297, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
Like a Good Nearest Neighbor: Practical Content Moderation and Text Classification (Bates & Gurevych, EACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.eacl-long.17.pdf
Software:
 2024.eacl-long.17.software.zip
Note:
 2024.eacl-long.17.note.zip
Video:
 https://aclanthology.org/2024.eacl-long.17.mp4