Stella Frank


2021

pdf bib
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers
Stella Frank | Emanuele Bugliarello | Desmond Elliott
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Pretrained vision-and-language BERTs aim to learn representations that combine information from both modalities. We propose a diagnostic method based on cross-modal input ablation to assess the extent to which these models actually integrate cross-modal information. This method involves ablating inputs from one modality, either entirely or selectively based on cross-modal grounding alignments, and evaluating the model prediction performance on the other modality. Model performance is measured by modality-specific tasks that mirror the model pretraining objectives (e.g. masked language modelling for text). Models that have learned to construct cross-modal representations using both modalities are expected to perform worse when inputs are missing from a modality. We find that recently proposed models have much greater relative difficulty predicting text when visual information is ablated, compared to predicting visual object categories when text is ablated, indicating that these models are not symmetrically cross-modal.

pdf bib
Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color
Mostafa Abdou | Artur Kulmizev | Daniel Hershcovich | Stella Frank | Ellie Pavlick | Anders Søgaard
Proceedings of the 25th Conference on Computational Natural Language Learning

Pretrained language models have been shown to encode relational information, such as the relations between entities or concepts in knowledge-bases — (Paris, Capital, France). However, simple relations of this type can often be recovered heuristically and the extent to which models implicitly reflect topological structure that is grounded in world, such as perceptual structure, is unknown. To explore this question, we conduct a thorough case study on color. Namely, we employ a dataset of monolexemic color terms and color chips represented in CIELAB, a color space with a perceptually meaningful distance metric. Using two methods of evaluating the structural alignment of colors in this space with text-derived color term representations, we find significant correspondence. Analyzing the differences in alignment across the color spectrum, we find that warmer colors are, on average, better aligned to the perceptual color space than cooler ones, suggesting an intriguing connection to findings from recent work on efficient communication in color naming. Further analysis suggests that differences in alignment are, in part, mediated by collocationality and differences in syntactic usage, posing questions as to the relationship between color perception and usage and context.

2020

pdf bib
CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning
Alessandro Suglia | Ioannis Konstas | Andrea Vanzo | Emanuele Bastianelli | Desmond Elliott | Stella Frank | Oliver Lemon
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Approaches to Grounded Language Learning are commonly focused on a single task-based final performance measure which may not depend on desirable properties of the learned hidden representations, such as their ability to predict object attributes or generalize to unseen situations. To remedy this, we present GroLLA, an evaluation framework for Grounded Language Learning with Attributes based on three sub-tasks: 1) Goal-oriented evaluation; 2) Object attribute prediction evaluation; and 3) Zero-shot evaluation. We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations, in particular with respect to attribute grounding. To this end, we extend the original GuessWhat?! dataset by including a semantic layer on top of the perceptual one. Specifically, we enrich the VisualGenome scene graphs associated with the GuessWhat?! images with several attributes from resources such as VISA and ImSitu. We then compare several hidden state representations from current state-of-the-art approaches to Grounded Language Learning. By using diagnostic classifiers, we show that current models’ learned representations are not expressive enough to encode object attributes (average F1 of 44.27). In addition, they do not learn strategies nor representations that are robust enough to perform well when novel scenes or objects are involved in gameplay (zero-shot best accuracy 50.06%).

2018

pdf bib
Findings of the Third Shared Task on Multimodal Machine Translation
Loïc Barrault | Fethi Bougares | Lucia Specia | Chiraag Lala | Desmond Elliott | Stella Frank
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We present the results from the third shared task on multimodal machine translation. In this task a source sentence in English is supplemented by an image and participating systems are required to generate a translation for such a sentence into German, French or Czech. The image can be used in addition to (or instead of) the source sentence. This year the task was extended with a third target language (Czech) and a new test set. In addition, a variant of this task was introduced with its own test set where the source sentence is given in multiple languages: English, French and German, and participating systems are required to generate a translation in Czech. Seven teams submitted 45 different systems to the two variants of the task. Compared to last year, the performance of the multimodal submissions improved, but text-only systems remain competitive.

2017

pdf bib
Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description
Desmond Elliott | Stella Frank | Loïc Barrault | Fethi Bougares | Lucia Specia
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
The QT21/HimL Combined Machine Translation System
Jan-Thorsten Peter | Tamer Alkhouli | Hermann Ney | Matthias Huck | Fabienne Braune | Alexander Fraser | Aleš Tamchyna | Ondřej Bojar | Barry Haddow | Rico Sennrich | Frédéric Blain | Lucia Specia | Jan Niehues | Alex Waibel | Alexandre Allauzen | Lauriane Aufrant | Franck Burlot | Elena Knyazeva | Thomas Lavergne | François Yvon | Mārcis Pinnis | Stella Frank
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
ILLC-UvA Adaptation System (Scorpio) at WMT’16 IT-DOMAIN Task
Hoang Cuong | Stella Frank | Khalil Sima’an
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
A Shared Task on Multimodal Machine Translation and Crosslingual Image Description
Lucia Specia | Stella Frank | Khalil Sima’an | Desmond Elliott
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
DCU-UvA Multimodal MT System Report
Iacer Calixto | Desmond Elliott | Stella Frank
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Multi30K: Multilingual English-German Image Descriptions
Desmond Elliott | Stella Frank | Khalil Sima’an | Lucia Specia
Proceedings of the 5th Workshop on Vision and Language

2015

pdf bib
Splitting Compounds by Semantic Analogy
Joachim Daiber | Lautaro Quiroz | Roger Wechsler | Stella Frank
Proceedings of the 1st Deep Machine Translation Workshop

2014

pdf bib
Weak semantic context helps phonetic learning in a model of infant language acquisition
Stella Frank | Naomi H. Feldman | Sharon Goldwater
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Learning the hyperparameters to learn morphology
Stella Frank
Proceedings of the 5th Workshop on Cognitive Aspects of Computational Language Learning (CogACLL)

2013

pdf bib
Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech
Stella Frank | Frank Keller | Sharon Goldwater
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Using Sentence Type Information for Syntactic Category Acquisition
Stella Frank | Sharon Goldwater | Frank Keller
Proceedings of the 2010 Workshop on Cognitive Modeling and Computational Linguistics