Jordi Bernad
2024
Building MUSCLE, a Dataset for MUltilingual Semantic Classification of Links between Entities
Lucia Pitarch | Carlos Bobed Lisbona | David Abián | Jorge Gracia | Jordi Bernad
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Lucia Pitarch | Carlos Bobed Lisbona | David Abián | Jorge Gracia | Jordi Bernad
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
In this paper we introduce MUSCLE, a dataset for MUltilingual lexico-Semantic Classification of Links between Entities. The MUSCLE dataset was designed to train and evaluate Lexical Relation Classification (LRC) systems with 27K pairs of universal concepts selected from Wikidata, a large and highly multilingual factual Knowledge Graph (KG). Each pair of concepts includes its lexical forms in 25 languages and is labeled with up to five possible lexico-semantic relations between the concepts: hypernymy, hyponymy, meronymy, holonymy, and antonymy. Inspired by Semantic Map theory, the dataset bridges lexical and conceptual semantics, is more challenging and robust than previous datasets for LRC, avoids lexical memorization, is domain-balanced across entities, and enables enrichment and hierarchical information retrieval.
MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations
Dagmar Gromann | Hugo Goncalo Oliveira | Lucia Pitarch | Elena-Simona Apostol | Jordi Bernad | Eliot Bytyçi | Chiara Cantone | Sara Carvalho | Francesca Frontini | Radovan Garabik | Jorge Gracia | Letizia Granata | Fahad Khan | Timotej Knez | Penny Labropoulou | Chaya Liebeskind | Maria Pia Di Buono | Ana Ostroški Anić | Sigita Rackevičienė | Ricardo Rodrigues | Gilles Sérasset | Linas Selmistraitis | Mahammadou Sidibé | Purificação Silvano | Blerina Spahiu | Enriketa Sogutlu | Ranka Stanković | Ciprian-Octavian Truică | Giedre Valunaite Oleskeviciene | Slavko Zitnik | Katerina Zdravkova
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Dagmar Gromann | Hugo Goncalo Oliveira | Lucia Pitarch | Elena-Simona Apostol | Jordi Bernad | Eliot Bytyçi | Chiara Cantone | Sara Carvalho | Francesca Frontini | Radovan Garabik | Jorge Gracia | Letizia Granata | Fahad Khan | Timotej Knez | Penny Labropoulou | Chaya Liebeskind | Maria Pia Di Buono | Ana Ostroški Anić | Sigita Rackevičienė | Ricardo Rodrigues | Gilles Sérasset | Linas Selmistraitis | Mahammadou Sidibé | Purificação Silvano | Blerina Spahiu | Enriketa Sogutlu | Ranka Stanković | Ciprian-Octavian Truică | Giedre Valunaite Oleskeviciene | Slavko Zitnik | Katerina Zdravkova
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Understanding the relation between the meanings of words is an important part of comprehending natural language. Prior work has either focused on analysing lexical semantic relations in word embeddings or probing pretrained language models (PLMs), with some exceptions. Given the rarity of highly multilingual benchmarks, it is unclear to what extent PLMs capture relational knowledge and are able to transfer it across languages. To start addressing this question, we propose MultiLexBATS, a multilingual parallel dataset of lexical semantic relations adapted from BATS in 15 languages including low-resource languages, such as Bambara, Lithuanian, and Albanian. As experiment on cross-lingual transfer of relational knowledge, we test the PLMs’ ability to (1) capture analogies across languages, and (2) predict translation targets. We find considerable differences across relation types and languages with a clear preference for hypernymy and antonymy as well as romance languages.
2023
Search
Fix author
Co-authors
- Jorge Gracia 3
- Lucía Pitarch 3
- David Abián 1
- Ana Ostroški Anić 1
- Elena-Simona Apostol 1
- Carlos Bobed Lisbona 1
- Eliot Bytyçi 1
- Chiara Cantone 1
- Sara Carvalho 1
- Maria Pia Di Buono 1
- Francesca Frontini 1
- Radovan Garabík 1
- Hugo Gonçalo Oliveira 1
- Letizia Granata 1
- Dagmar Gromann 1
- Fahad Khan 1
- Timotej Knez 1
- Penny Labropoulou 1
- Chaya Liebeskind 1
- Sigita Rackevičienė 1
- Ricardo Rodrigues 1
- Linas Selmistraitis 1
- Mahammadou Sidibé 1
- Purificação Silvano 1
- Enriketa Sogutlu 1
- Blerina Spahiu 1
- Ranka Stanković 1
- Gilles Sérasset 1
- Ciprian-Octavian Truică 1
- Giedrė Valūnaitė-Oleškevičienė 1
- Katerina Zdravkova 1
- Slavko Žitnik 1