AM2iCo: Evaluating Word Meaning in Context across Low-Resource Languages with Adversarial Examples

Qianchu Liu, Edoardo Maria Ponti, Diana McCarthy, Ivan Vulić, Anna Korhonen


Abstract
Capturing word meaning in context and distinguishing between correspondences and variations across languages is key to building successful multilingual and cross-lingual text representation models. However, existing multilingual evaluation datasets that evaluate lexical semantics “in-context” have various limitations. In particular, 1) their language coverage is restricted to high-resource languages and skewed in favor of only a few language families and areas, 2) a design that makes the task solvable via superficial cues, which results in artificially inflated (and sometimes super-human) performances of pretrained encoders, and 3) no support for cross-lingual evaluation. In order to address these gaps, we present AM2iCo (Adversarial and Multilingual Meaning in Context), a wide-coverage cross-lingual and multilingual evaluation set; it aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts for 14 language pairs. We conduct a series of experiments in a wide range of setups and demonstrate the challenging nature of AM2iCo. The results reveal that current SotA pretrained encoders substantially lag behind human performance, and the largest gaps are observed for low-resource languages and languages dissimilar to English.
Anthology ID:
2021.emnlp-main.571
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7151–7162
Language:
URL:
https://aclanthology.org/2021.emnlp-main.571
DOI:
10.18653/v1/2021.emnlp-main.571
Bibkey:
Cite (ACL):
Qianchu Liu, Edoardo Maria Ponti, Diana McCarthy, Ivan Vulić, and Anna Korhonen. 2021. AM2iCo: Evaluating Word Meaning in Context across Low-Resource Languages with Adversarial Examples. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7151–7162, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
AM2iCo: Evaluating Word Meaning in Context across Low-Resource Languages with Adversarial Examples (Liu et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.571.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.571.mp4
Code
 cambridgeltl/AM2iCo
Data
AM2iCoWiCWord Sense Disambiguation: a Unified Evaluation Framework and Empirical ComparisonXL-WiC