CAT ManyNames: A New Dataset for Object Naming in Catalan

Mar Domínguez Orfila, Maite Melero Nogués, Gemma Boleda Torrent


Abstract
Object Naming is an important task within the field of Language and Vision that consists of generating a correct and appropriate name for an object given an image. The ManyNames dataset uses real-world human annotated images with multiple labels, instead of just one. In this work, we describe the adaptation of this dataset (originally in English) to Catalan, by (i) machine-translating the English labels and (ii) collecting human annotations for a subset of the original corpus and comparing both resources. Analyses reveal divergences in the lexical variation of the two sets showing potential problems of directly translated resources, particularly when there is no resource to a proper context, which in this case is conveyed by the image. The analysis also points to the impact of cultural factors in the naming task, which should be accounted for in future cross-lingual naming tasks.
Anthology ID:
2022.cogalex-1.4
Volume:
Proceedings of the Workshop on Cognitive Aspects of the Lexicon
Month:
November
Year:
2022
Address:
Taipei, Taiwan
Editors:
Michael Zock, Emmanuele Chersoni, Yu-Yin Hsu, Enrico Santus
Venue:
CogALex
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
31–36
Language:
URL:
https://aclanthology.org/2022.cogalex-1.4
DOI:
10.18653/v1/2022.cogalex-1.4
Bibkey:
Cite (ACL):
Mar Domínguez Orfila, Maite Melero Nogués, and Gemma Boleda Torrent. 2022. CAT ManyNames: A New Dataset for Object Naming in Catalan. In Proceedings of the Workshop on Cognitive Aspects of the Lexicon, pages 31–36, Taipei, Taiwan. Association for Computational Linguistics.
Cite (Informal):
CAT ManyNames: A New Dataset for Object Naming in Catalan (Domínguez Orfila et al., CogALex 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.cogalex-1.4.pdf
Dataset:
 2022.cogalex-1.4.Dataset.tsv