A data-driven approach to verbal multiword expression detection. PARSEME Shared Task system description paper

Tiberiu Boros, Sonia Pipa, Verginica Barbu Mititelu, Dan Tufis


Abstract
“Multiword expressions” are groups of words acting as a morphologic, syntactic and semantic unit in linguistic analysis. Verbal multiword expressions represent the subgroup of multiword expressions, namely that in which a verb is the syntactic head of the group considered in its canonical (or dictionary) form. All multiword expressions are a great challenge for natural language processing, but the verbal ones are particularly interesting for tasks such as parsing, as the verb is the central element in the syntactic organization of a sentence. In this paper we introduce our data-driven approach to verbal multiword expressions which was objectively validated during the PARSEME shared task on verbal multiword expressions identification. We tested our approach on 12 languages, and we provide detailed information about corpora composition, feature selection process, validation procedure and performance on all languages.
Anthology ID:
W17-1716
Volume:
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Month:
April
Year:
2017
Address:
Valencia, Spain
Venue:
MWE
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
121–126
Language:
URL:
https://aclanthology.org/W17-1716
DOI:
10.18653/v1/W17-1716
Bibkey:
Cite (ACL):
Tiberiu Boros, Sonia Pipa, Verginica Barbu Mititelu, and Dan Tufis. 2017. A data-driven approach to verbal multiword expression detection. PARSEME Shared Task system description paper. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pages 121–126, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
A data-driven approach to verbal multiword expression detection. PARSEME Shared Task system description paper (Boros et al., MWE 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-1716.pdf