Examining the Effects of Language-and-Vision Data Augmentation for Generation of Descriptions of Human Faces

Nikolai Ilinykh, Rafal Černiavski, Eva Elžbieta Sventickaitė, Viktorija Buzaitė, Simon Dobnik


Abstract
We investigate how different augmentation techniques on both textual and visual representations affect the performance of the face description generation model. Specifically, we provide the model with either original images, sketches of faces, facial composites or distorted images. In addition, on the language side, we experiment with different methods to augment the original dataset with paraphrased captions, which are semantically equivalent to the original ones, but differ in terms of their form. We also examine if augmenting the dataset with descriptions from a different domain (e.g., image captions of real-world images) has an effect on the performance of the models. We train models on different combinations of visual and linguistic features and perform both (i) automatic evaluation of generated captions and (ii) examination of how useful different visual features are for the task of facial feature classification. Our results show that although original images encode the best possible representation for the task, the model trained on sketches can still perform relatively well. We also observe that augmenting the dataset with descriptions from a different domain can boost performance of the model. We conclude that face description generation systems are more susceptible to language rather than vision data augmentation. Overall, we demonstrate that face caption generation models display a strong imbalance in the utilisation of language and vision modalities, indicating a lack of proper information fusion. We also describe ethical implications of our study and argue that future work on human face description generation should create better, more representative datasets.
Anthology ID:
2022.pvlam-1.5
Volume:
Proceedings of the 2nd Workshop on People in Vision, Language, and the Mind
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Patrizia Paggio, Albert Gatt, Marc Tanti
Venue:
PVLAM
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
26–40
Language:
URL:
https://aclanthology.org/2022.pvlam-1.5
DOI:
Bibkey:
Cite (ACL):
Nikolai Ilinykh, Rafal Černiavski, Eva Elžbieta Sventickaitė, Viktorija Buzaitė, and Simon Dobnik. 2022. Examining the Effects of Language-and-Vision Data Augmentation for Generation of Descriptions of Human Faces. In Proceedings of the 2nd Workshop on People in Vision, Language, and the Mind, pages 26–40, Marseille, France. European Language Resources Association.
Cite (Informal):
Examining the Effects of Language-and-Vision Data Augmentation for Generation of Descriptions of Human Faces (Ilinykh et al., PVLAM 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.pvlam-1.5.pdf
Data
CelebA-HQ