Train-O-Matic: Large-Scale Supervised Word Sense Disambiguation in Multiple Languages without Manual Training Data

Tommaso Pasini, Roberto Navigli


Abstract
Annotating large numbers of sentences with senses is the heaviest requirement of current Word Sense Disambiguation. We present Train-O-Matic, a language-independent method for generating millions of sense-annotated training instances for virtually all meanings of words in a language’s vocabulary. The approach is fully automatic: no human intervention is required and the only type of human knowledge used is a WordNet-like resource. Train-O-Matic achieves consistently state-of-the-art performance across gold standard datasets and languages, while at the same time removing the burden of manual annotation. All the training data is available for research purposes at http://trainomatic.org.
Anthology ID:
D17-1008
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
78–88
Language:
URL:
https://aclanthology.org/D17-1008
DOI:
10.18653/v1/D17-1008
Bibkey:
Cite (ACL):
Tommaso Pasini and Roberto Navigli. 2017. Train-O-Matic: Large-Scale Supervised Word Sense Disambiguation in Multiple Languages without Manual Training Data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 78–88, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Train-O-Matic: Large-Scale Supervised Word Sense Disambiguation in Multiple Languages without Manual Training Data (Pasini & Navigli, EMNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/D17-1008.pdf
Attachment:
 D17-1008.Attachment.zip
Video:
 https://aclanthology.org/D17-1008.mp4
Data
Senseval-2United Nations Parallel CorpusWord Sense Disambiguation: a Unified Evaluation Framework and Empirical Comparison