Evaluation of automatic collocation extraction methods for language learning

Vishal Bhalla, Klara Klimcikova


Abstract
A number of methods have been proposed to automatically extract collocations, i.e., conventionalized lexical combinations, from text corpora. However, the attempts to evaluate and compare them with a specific application in mind lag behind. This paper compares three end-to-end resources for collocation learning, all of which used the same corpus but different methods. Adopting a gold-standard evaluation method, the results show that the method of dependency parsing outperforms regex-over-pos in collocation identification. The lexical association measures (AMs) used for collocation ranking perform about the same overall but differently for individual collocation types. Further analysis has also revealed that there are considerable differences between other commonly used AMs.
Anthology ID:
W19-4428
Volume:
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Helen Yannakoudakis, Ekaterina Kochmar, Claudia Leacock, Nitin Madnani, Ildikó Pilán, Torsten Zesch
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
264–274
Language:
URL:
https://aclanthology.org/W19-4428
DOI:
10.18653/v1/W19-4428
Bibkey:
Cite (ACL):
Vishal Bhalla and Klara Klimcikova. 2019. Evaluation of automatic collocation extraction methods for language learning. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 264–274, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Evaluation of automatic collocation extraction methods for language learning (Bhalla & Klimcikova, BEA 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4428.pdf
Code
 vishalbhalla/autocoleval