Caption Enriched Samples for Improving Hateful Memes Detection

Efrat Blaier, Itzik Malkiel, Lior Wolf


Abstract
The recently introduced hateful meme challenge demonstrates the difficulty of determining whether a meme is hateful or not. Specifically, both unimodal language models and multimodal vision-language models cannot reach the human level of performance. Motivated by the need to model the contrast between the image content and the overlayed text, we suggest applying an off-the-shelf image captioning tool in order to capture the first. We demonstrate that the incorporation of such automatic captions during fine-tuning improves the results for various unimodal and multimodal models. Moreover, in the unimodal case, continuing the pre-training of language models on augmented and original caption pairs, is highly beneficial to the classification accuracy.
Anthology ID:
2021.emnlp-main.738
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9350–9358
Language:
URL:
https://aclanthology.org/2021.emnlp-main.738
DOI:
10.18653/v1/2021.emnlp-main.738
Bibkey:
Cite (ACL):
Efrat Blaier, Itzik Malkiel, and Lior Wolf. 2021. Caption Enriched Samples for Improving Hateful Memes Detection. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9350–9358, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Caption Enriched Samples for Improving Hateful Memes Detection (Blaier et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.738.pdf
Software:
 2021.emnlp-main.738.Software.zip
Video:
 https://aclanthology.org/2021.emnlp-main.738.mp4
Code
 efrat-safanov/caption-enriched-samples-research