Using Wiktionary as a resource for WSD : the case of French verbs

Vincent Segonne, Marie Candito, Benoît Crabbé


Abstract
As opposed to word sense induction, word sense disambiguation (WSD) has the advantage of us-ing interpretable senses, but requires annotated data, which are quite rare for most languages except English (Miller et al. 1993; Fellbaum, 1998). In this paper, we investigate which strategy to adopt to achieve WSD for languages lacking data that was annotated specifically for the task, focusing on the particular case of verb disambiguation in French. We first study the usability of Eurosense (Bovi et al. 2017) , a multilingual corpus extracted from Europarl (Kohen, 2005) and automatically annotated with BabelNet (Navigli and Ponzetto, 2010) senses. Such a resource opened up the way to supervised and semi-supervised WSD for resourceless languages like French. While this perspective looked promising, our evaluation on French verbs was inconclusive and showed the annotated senses’ quality was not sufficient for supervised WSD on French verbs. Instead, we propose to use Wiktionary, a collaboratively edited, multilingual online dictionary, as a resource for WSD. Wiktionary provides both sense inventory and manually sense tagged examples which can be used to train supervised and semi-supervised WSD systems. Yet, because senses’ distribution differ in lexicographic examples found in Wiktionary with respect to natural text, we then focus on studying the impact on WSD of the training data size and senses’ distribution. Using state-of-the art semi-supervised systems, we report experiments of Wiktionary-based WSD for French verbs, evaluated on FrenchSemEval (FSE), a new dataset of French verbs manually annotated with wiktionary senses.
Anthology ID:
W19-0422
Volume:
Proceedings of the 13th International Conference on Computational Semantics - Long Papers
Month:
May
Year:
2019
Address:
Gothenburg, Sweden
Editors:
Simon Dobnik, Stergios Chatzikyriakidis, Vera Demberg
Venue:
IWCS
SIG:
SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
259–270
Language:
URL:
https://aclanthology.org/W19-0422/
DOI:
10.18653/v1/W19-0422
Bibkey:
Cite (ACL):
Vincent Segonne, Marie Candito, and Benoît Crabbé. 2019. Using Wiktionary as a resource for WSD : the case of French verbs. In Proceedings of the 13th International Conference on Computational Semantics - Long Papers, pages 259–270, Gothenburg, Sweden. Association for Computational Linguistics.
Cite (Informal):
Using Wiktionary as a resource for WSD : the case of French verbs (Segonne et al., IWCS 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-0422.pdf