Annotating the French Wiktionary with supersenses for large scale lexical analysis: a use case to assess form-meaning relationships within the nominal lexicon

Nicolas Angleraud, Lucie Barque, Marie Candito


Abstract
Many languages lack broad-coverage, semantically annotated lexical resources, which limits empirical research on lexical semantics for these languages. In this paper, we report on how we automatically enriched the French Wiktionnary with general semantic classes, known as supersenses, using a limited amount of manually annotated data. We trained a classifier combining sense definition classification and sense exemplars classification. The resulting resource, with an evaluated supersense accuracy of nearly 85% (92% for hypersenses), is used in a case study illustrating how such an semantically enriched resource can be leveraged to empirically test linguistic hypotheses about the lexicon, on a large scale.
Anthology ID:
2025.coling-main.356
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5321–5332
Language:
URL:
https://aclanthology.org/2025.coling-main.356/
DOI:
Bibkey:
Cite (ACL):
Nicolas Angleraud, Lucie Barque, and Marie Candito. 2025. Annotating the French Wiktionary with supersenses for large scale lexical analysis: a use case to assess form-meaning relationships within the nominal lexicon. In Proceedings of the 31st International Conference on Computational Linguistics, pages 5321–5332, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Annotating the French Wiktionary with supersenses for large scale lexical analysis: a use case to assess form-meaning relationships within the nominal lexicon (Angleraud et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.356.pdf