Multi-word Entity Classification in a Highly Multilingual Environment

Sophie Chesney; Guillaume Jacquet; Ralf Steinberger; Jakub Piskorski

doi:10.18653/v1/W17-1702

Multi-word Entity Classification in a Highly Multilingual Environment

Sophie Chesney, Guillaume Jacquet, Ralf Steinberger, Jakub Piskorski

Abstract

This paper describes an approach for the classification of millions of existing multi-word entities (MWEntities), such as organisation or event names, into thirteen category types, based only on the tokens they contain. In order to classify our very large in-house collection of multilingual MWEntities into an application-oriented set of entity categories, we trained and tested distantly-supervised classifiers in 43 languages based on MWEntities extracted from BabelNet. The best-performing classifier was the multi-class SVM using a TF.IDF-weighted data representation. Interestingly, one unique classifier trained on a mix of all languages consistently performed better than classifiers trained for individual languages, reaching an averaged F1-value of 88.8%. In this paper, we present the training and test data, including a human evaluation of its accuracy, describe the methods used to train the classifiers, and discuss the results.

Anthology ID:: W17-1702
Volume:: Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Month:: April
Year:: 2017
Address:: Valencia, Spain
Editors:: Stella Markantonatou, Carlos Ramisch, Agata Savary, Veronika Vincze
Venue:: MWE
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11–20
Language:
URL:: https://aclanthology.org/W17-1702/
DOI:: 10.18653/v1/W17-1702
Bibkey:
Cite (ACL):: Sophie Chesney, Guillaume Jacquet, Ralf Steinberger, and Jakub Piskorski. 2017. Multi-word Entity Classification in a Highly Multilingual Environment. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pages 11–20, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):: Multi-word Entity Classification in a Highly Multilingual Environment (Chesney et al., MWE 2017)
Copy Citation:
PDF:: https://aclanthology.org/W17-1702.pdf

PDF Cite Search Fix data