DiaWUG: A Dataset for Diatopic Lexical Semantic Variation in Spanish

Gioia Baldissin, Dominik Schlechtweg, Sabine Schulte im Walde


Abstract
We provide a novel dataset – DiaWUG – with judgements on diatopic lexical semantic variation for six Spanish variants in Europe and Latin America. In contrast to most previous meaning-based resources and studies on semantic diatopic variation, we collect annotations on semantic relatedness for Spanish target words in their contexts from both a semasiological perspective (i.e., exploring the meanings of a word given its form, thus including polysemy) and an onomasiological perspective (i.e., exploring identical meanings of words with different forms, thus including synonymy). In addition, our novel dataset exploits and extends the existing framework DURel for annotating word senses in context (Erk et al., 2013; Schlechtweg et al., 2018) and the framework-embedded Word Usage Graphs (WUGs) – which up to now have mainly be used for semasiological tasks and resources – in order to distinguish, visualize and interpret lexical semantic variation of contextualized words in Spanish from these two perspectives, i.e., semasiological and onomasiological language variation.
Anthology ID:
2022.lrec-1.278
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2601–2609
Language:
URL:
https://aclanthology.org/2022.lrec-1.278
DOI:
Bibkey:
Cite (ACL):
Gioia Baldissin, Dominik Schlechtweg, and Sabine Schulte im Walde. 2022. DiaWUG: A Dataset for Diatopic Lexical Semantic Variation in Spanish. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2601–2609, Marseille, France. European Language Resources Association.
Cite (Informal):
DiaWUG: A Dataset for Diatopic Lexical Semantic Variation in Spanish (Baldissin et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.278.pdf