Edoardo Ponti


pdf bib
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Alan Ansell | Edoardo Ponti | Anna Korhonen | Ivan Vulić
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream approach for transfer learning. To increase its efficiency and prevent catastrophic forgetting and interference, techniques like adapters and sparse fine-tuning have been developed. Adapters are modular, as they can be combined to adapt a model towards different facets of knowledge (e.g., dedicated language and/or task adapters). Sparse fine-tuning is expressive, as it controls the behavior of all model components. In this work, we introduce a new fine-tuning method with both these desirable properties. In particular, we learn sparse, real-valued masks based on a simple variant of the Lottery Ticket Hypothesis. Task-specific masks are obtained from annotated data in a source language, and language-specific masks from masked language modeling in a target language. Both these masks can then be composed with the pretrained model. Unlike adapter-based fine-tuning, this method neither increases the number of parameters at inference time nor alters the original model architecture. Most importantly, it outperforms adapters in zero-shot cross-lingual transfer by a large margin in a series of multilingual benchmarks, including Universal Dependencies, MasakhaNER, and AmericasNLI. Based on an in-depth analysis, we additionally find that sparsity is crucial to prevent both 1) interference between the fine-tunings to be composed and 2) overfitting. We release the code and models at https://github.com/cambridgeltl/composable-sft.

pdf bib
Image Retrieval from Contextual Descriptions
Benno Krojer | Vaibhav Adlakha | Vibhav Vineet | Yash Goyal | Edoardo Ponti | Siva Reddy
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The ability to integrate context, including perceptual and temporal cues, plays a pivotal role in grounding the meaning of a linguistic utterance. In order to measure to what extent current vision-and-language models master this ability, we devise a new multimodal challenge, Image Retrieval from Contextual Descriptions (ImageCoDe). In particular, models are tasked with retrieving the correct image from a set of 10 minimally contrastive candidates based on a contextual description.As such, each description contains only the details that help distinguish between images.Because of this, descriptions tend to be complex in terms of syntax and discourse and require drawing pragmatic inferences. Images are sourced from both static pictures and video frames.We benchmark several state-of-the-art models, including both cross-encoders such as ViLBERT and bi-encoders such as CLIP, on ImageCoDe.Our results reveal that these models dramatically lag behind human performance: the best variant achieves an accuracy of 20.9 on video frames and 59.4 on static pictures, compared with 90.8 in humans.Furthermore, we experiment with new model variants that are better equipped to incorporate visual and temporal context into their representations, which achieve modest gains. Our hope is that ImageCoDE will foster progress in grounded language understanding by encouraging models to focus on fine-grained visual differences.

pdf bib
Natural Language Processing for Multilingual Task-Oriented Dialogue
Evgeniia Razumovskaia | Goran Glavaš | Olga Majewska | Edoardo Ponti | Ivan Vulić
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

Recent advances in deep learning have also enabled fast progress in the research of task-oriented dialogue (ToD) systems. However, the majority of ToD systems are developed for English and merely a handful of other widely spoken languages, e.g., Chinese and German. This hugely limits the global reach and, consequently, transformative socioeconomic potential of such systems. In this tutorial, we will thus discuss and demonstrate the importance of (building) multilingual ToD systems, and then provide a systematic overview of current research gaps, challenges and initiatives related to multilingual ToD systems, with a particular focus on their connections to current research and challenges in multilingual and low-resource NLP. The tutorial will aim to provide answers or shed new light to the following questions: a) Why are multilingual dialogue systems so hard to build: what makes multilinguality for dialogue more challenging than for other NLP applications and tasks? b) What are the best existing methods and datasets for multilingual and cross-lingual (task-oriented) dialog systems? How are (multilingual) ToD systems usually evaluated? c) What are the promising future directions for multilingual ToD research: where can one draw inspiration from related NLP areas and tasks?


pdf bib
Language Modeling for Morphologically Rich Languages: Character-Aware Modeling for Word-Level Prediction
Daniela Gerz | Ivan Vulić | Edoardo Ponti | Jason Naradowsky | Roi Reichart | Anna Korhonen
Transactions of the Association for Computational Linguistics, Volume 6

Neural architectures are prominent in the construction of language models (LMs). However, word-level prediction is typically agnostic of subword-level information (characters and character sequences) and operates over a closed vocabulary, consisting of a limited word set. Indeed, while subword-aware models boost performance across a variety of NLP tasks, previous work did not evaluate the ability of these models to assist next-word prediction in language modeling tasks. Such subword-level informed models should be particularly effective for morphologically-rich languages (MRLs) that exhibit high type-to-token ratios. In this work, we present a large-scale LM study on 50 typologically diverse languages covering a wide variety of morphological systems, and offer new LM benchmarks to the community, while considering subword-level information. The main technical contribution of our work is a novel method for injecting subword-level information into semantic word vectors, integrated into the neural language modeling training, to facilitate word-level prediction. We conduct experiments in the LM setting where the number of infrequent words is large, and demonstrate strong perplexity gains across our 50 languages, especially for morphologically-rich languages. Our code and data sets are publicly available.