VMWE discovery: a comparative analysis between Literature and Twitter Corpora
Vivian Stamou | Artemis Xylogianni | Marilena Malli | Penny Takorou | Stella Markantonatou
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
We evaluate manually five lexical association measurements as regards the discovery of Modern Greek verb multiword expressions with two or more lexicalised components usingmwetoolkit3 (Ramisch et al., 2010). We use Twitter corpora and compare our findings with previous work on fiction corpora. The results of LL, MLE and T-score were found to overlap significantly in both the fiction and the Twitter corpora, while the results of PMI and Dice do not.We find that MWEs with two lexicalised components are more frequent in Twitter than in fiction corpora and that lean syntactic patterns help retrieve them more efficiently than richer ones.Our work (i) supports the enrichment of the lexicographical database for Modern Greek MWEs’ IDION’ (Markantonatou et al., 2019) and (ii) highlights aspects of the usage of five association measurements on specific text genres for best MWE discovery results.