Extracting meaning by idiomaticity: Description of the HSemID system at CogALex VI (2020)

Jean-Pierre Colson


Abstract
The HSemID system, submitted to the CogALex VI Shared Task is a hybrid system relying mainly on metric clusters measured in large web corpora, complemented by a vector space model using cosine similarity to detect semantic associations. Although the system reached ra-ther weak results for the subcategories of synonyms, antonyms and hypernyms, with some dif-ferences from one language to another, it is able to measure general semantic associations (as being random or not-random) with an F1 score close to 0.80. The results strongly suggest that idiomatic constructions play a fundamental role in semantic associations. Further experiments are necessary in order to fine-tune the model to the subcategories of synonyms, antonyms, hy-pernyms and to explain surprising differences across languages. 1 Introduction
Anthology ID:
2020.cogalex-1.6
Volume:
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon
Month:
December
Year:
2020
Address:
Online
Editors:
Michael Zock, Emmanuele Chersoni, Alessandro Lenci, Enrico Santus
Venue:
CogALex
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
54–58
Language:
URL:
https://aclanthology.org/2020.cogalex-1.6
DOI:
Bibkey:
Cite (ACL):
Jean-Pierre Colson. 2020. Extracting meaning by idiomaticity: Description of the HSemID system at CogALex VI (2020). In Proceedings of the Workshop on the Cognitive Aspects of the Lexicon, pages 54–58, Online. Association for Computational Linguistics.
Cite (Informal):
Extracting meaning by idiomaticity: Description of the HSemID system at CogALex VI (2020) (Colson, CogALex 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.cogalex-1.6.pdf