Manual Clustering and Spatial Arrangement of Verbs for Multilingual Evaluation and Typology Analysis

Olga Majewska, Ivan Vulić, Diana McCarthy, Anna Korhonen


Abstract
We present the first evaluation of the applicability of a spatial arrangement method (SpAM) to a typologically diverse language sample, and its potential to produce semantic evaluation resources to support multilingual NLP, with a focus on verb semantics. We demonstrate SpAM’s utility in allowing for quick bottom-up creation of large-scale evaluation datasets that balance cross-lingual alignment with language specificity. Starting from a shared sample of 825 English verbs, translated into Chinese, Japanese, Finnish, Polish, and Italian, we apply a two-phase annotation process which produces (i) semantic verb classes and (ii) fine-grained similarity scores for nearly 130 thousand verb pairs. We use the two types of verb data to (a) examine cross-lingual similarities and variation, and (b) evaluate the capacity of static and contextualised representation models to accurately reflect verb semantics, contrasting the performance of large language specific pretraining models with their multilingual equivalent on semantic clustering and lexical similarity, across different domains of verb meaning. We release the data from both phases as a large-scale multilingual resource, comprising 85 verb classes and nearly 130k pairwise similarity scores, offering a wealth of possibilities for further evaluation and research on multilingual verb semantics.
Anthology ID:
2020.coling-main.423
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
4810–4824
Language:
URL:
https://aclanthology.org/2020.coling-main.423
DOI:
10.18653/v1/2020.coling-main.423
Bibkey:
Cite (ACL):
Olga Majewska, Ivan Vulić, Diana McCarthy, and Anna Korhonen. 2020. Manual Clustering and Spatial Arrangement of Verbs for Multilingual Evaluation and Typology Analysis. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4810–4824, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Manual Clustering and Spatial Arrangement of Verbs for Multilingual Evaluation and Typology Analysis (Majewska et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.423.pdf
Code
 om304/multi-spa-verb