A Novel Evaluation Method for Morphological Segmentation

Javad Nouri, Roman Yangarber


Abstract
Unsupervised learning of morphological segmentation of words in a language, based only on a large corpus of words, is a challenging task. Evaluation of the learned segmentations is a challenge in itself, due to the inherent ambiguity of the segmentation task. There is no way to posit unique “correct” segmentation for a set of data in an objective way. Two models may arrive at different ways of segmenting the data, which may nonetheless both be valid. Several evaluation methods have been proposed to date, but they do not insist on consistency of the evaluated model. We introduce a new evaluation methodology, which enforces correctness of segmentation boundaries while also assuring consistency of segmentation decisions across the corpus.
Anthology ID:
L16-1495
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3102–3109
Language:
URL:
https://aclanthology.org/L16-1495
DOI:
Bibkey:
Cite (ACL):
Javad Nouri and Roman Yangarber. 2016. A Novel Evaluation Method for Morphological Segmentation. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3102–3109, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
A Novel Evaluation Method for Morphological Segmentation (Nouri & Yangarber, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1495.pdf