Spatial Multi-Arrangement for Clustering and Multi-way Similarity Dataset Construction

Olga Majewska, Diana McCarthy, Jasper van den Bosch, Nikolaus Kriegeskorte, Ivan Vulić, Anna Korhonen


Abstract
We present a novel methodology for fast bottom-up creation of large-scale semantic similarity resources to support development and evaluation of NLP systems. Our work targets verb similarity, but the methodology is equally applicable to other parts of speech. Our approach circumvents the bottleneck of slow and expensive manual development of lexical resources by leveraging semantic intuitions of native speakers and adapting a spatial multi-arrangement approach from cognitive neuroscience, used before only with visual stimuli, to lexical stimuli. Our approach critically obtains judgments of word similarity in the context of a set of related words, rather than of word pairs in isolation. We also handle lexical ambiguity as a natural consequence of a two-phase process where verbs are placed in broad semantic classes prior to the fine-grained spatial similarity judgments. Our proposed design produces a large-scale verb resource comprising 17 relatedness-based classes and a verb similarity dataset containing similarity scores for 29,721 unique verb pairs and 825 target verbs, which we release with this paper.
Anthology ID:
2020.lrec-1.705
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5749–5758
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.705
DOI:
Bibkey:
Cite (ACL):
Olga Majewska, Diana McCarthy, Jasper van den Bosch, Nikolaus Kriegeskorte, Ivan Vulić, and Anna Korhonen. 2020. Spatial Multi-Arrangement for Clustering and Multi-way Similarity Dataset Construction. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5749–5758, Marseille, France. European Language Resources Association.
Cite (Informal):
Spatial Multi-Arrangement for Clustering and Multi-way Similarity Dataset Construction (Majewska et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.705.pdf