Rapid Development of Morphological Analyzers for Typologically Diverse Languages

Seth Kulick, Ann Bies


Abstract
The Low Resource Language research conducted under DARPA’s Broad Operational Language Translation (BOLT) program required the rapid creation of text corpora of typologically diverse languages (Turkish, Hausa, and Uzbek) which were annotated with morphological information, along with other types of annotation. Since the output of morphological analyzers is a significant aid to morphological annotation, we developed a morphological analyzer for each language in order to support the annotation task, and also as a deliverable by itself. Our framework for analyzer creation results in tables similar to those used in the successful SAMA analyzer for Arabic, but with a more abstract linguistic level, from which the tables are derived. A lexicon was developed from available resources for integration with the analyzer, and given the speed of development and uncertain coverage of the lexicon, we assumed that the analyzer would necessarily be lacking in some coverage for the project annotation. Our analyzer framework was therefore focused on rapid implementation of the key structures of the language, together with accepting “wildcard” solutions as possible analyses for a word with an unknown stem, building upon our similar experiences with morphological annotation with Modern Standard Arabic and Egyptian Arabic.
Anthology ID:
L16-1405
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2551–2557
Language:
URL:
https://aclanthology.org/L16-1405
DOI:
Bibkey:
Cite (ACL):
Seth Kulick and Ann Bies. 2016. Rapid Development of Morphological Analyzers for Typologically Diverse Languages. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2551–2557, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Rapid Development of Morphological Analyzers for Typologically Diverse Languages (Kulick & Bies, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1405.pdf