Synonym relations affect object detection learned on vision-language data

Giacomo Nebbia, Adriana Kovashka


Abstract
We analyze whether object detectors trained on vision-language data learn effective visual representations for synonyms. Since many current vision-language models accept user-provided textual input, we highlight the need for such models to learn feature representations that are robust to changes in how such input is provided. Specifically, we analyze changes in synonyms used to refer to objects. Here, we study object detectors trained on vision-language data and investigate how to make their performance less dependent on whether synonyms are used to refer to an object. We propose two approaches to achieve this goal: data augmentation by back-translation and class embedding enrichment. We show the promise of such approaches, reporting improved performance on synonyms from mAP@0.5=33.87% to 37.93%.
Anthology ID:
2024.findings-naacl.239
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3770–3776
Language:
URL:
https://aclanthology.org/2024.findings-naacl.239
DOI:
10.18653/v1/2024.findings-naacl.239
Bibkey:
Cite (ACL):
Giacomo Nebbia and Adriana Kovashka. 2024. Synonym relations affect object detection learned on vision-language data. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 3770–3776, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Synonym relations affect object detection learned on vision-language data (Nebbia & Kovashka, Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.239.pdf