Hazem Al Saied
Also published as: Hazem Al Saied
2019
Comparing linear and neural models for competitive MWE identification
Hazem Al Saied
|
Marie Candito
|
Mathieu Constant
Proceedings of the 22nd Nordic Conference on Computational Linguistics
In this paper, we compare the use of linear versus neural classifiers in a greedy transition system for MWE identification. Both our linear and neural models achieve a new state-of-the-art on the PARSEME 1.1 shared task data sets, comprising 20 languages. Surprisingly, our best model is a simple feed-forward network with one hidden layer, although more sophisticated (recurrent) architectures were tested. The feedback from this study is that tuning a SVM is rather straightforward, whereas tuning our neural system revealed more challenging. Given the number of languages and the variety of linguistic phenomena to handle for the MWE identification task, we have designed an accurate tuning procedure, and we show that hyperparameters are better selected by using a majority-vote within random search configurations rather than a simple best configuration selection. Although the performance is rather good (better than both the best shared task system and the average of the best per-language results), further work is needed to improve the generalization power, especially on unseen MWEs.
2017
The ATILF-LLF System for Parseme Shared Task: a Transition-based Verbal Multiword Expression Tagger
Hazem Al Saied
|
Matthieu Constant
|
Marie Candito
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
We describe the ATILF-LLF system built for the MWE 2017 Shared Task on automatic identification of verbal multiword expressions. We participated in the closed track only, for all the 18 available languages. Our system is a robust greedy transition-based system, in which MWE are identified through a MERGE transition. The system was meant to accommodate the variety of linguistic resources provided for each language, in terms of accompanying morphological and syntactic information. Using per-MWE Fscore, the system was ranked first for all but two languages (Hungarian and Romanian).