Sergey Berezin
2023
No offence, Bert - I insult only humans! Multilingual sentence-level attack on toxicity detection networks
Sergey Berezin
|
Reza Farahbakhsh
|
Noel Crespi
Findings of the Association for Computational Linguistics: EMNLP 2023
We introduce a simple yet efficient sentence-level attack on black-box toxicity detector models. By adding several positive words or sentences to the end of a hateful message, we are able to change the prediction of a neural network and pass the toxicity detection system check. This approach is shown to be working on seven languages from three different language families. We also describe the defence mechanism against the aforementioned attack and discuss its limitations.
2022
Named Entity Inclusion in Abstractive Text Summarization
Sergey Berezin
|
Tatiana Batura
Proceedings of the Third Workshop on Scholarly Document Processing
We address the named entity omission - the drawback of many current abstractive text summarizers. We suggest a custom pretraining objective to enhance the model’s attention on the named entities in a text. At first, the named entity recognition model RoBERTa is trained to determine named entities in the text. After that this model is used to mask named entities in the text and the BART model is trained to reconstruct them. Next, BART model is fine-tuned on the summarization task. Our experiments showed that this pretraining approach drastically improves named entity inclusion precision and recall metrics.
Search