Andargachew Mekonnen Gezmu


2018

pdf bib
Portable Spelling Corrector for a Less-Resourced Language: Amharic
Andargachew Mekonnen Gezmu | Andreas Nürnberger | Binyam Ephrem Seyoum
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Contemporary Amharic Corpus: Automatically Morpho-Syntactically Tagged Amharic Corpus
Andargachew Mekonnen Gezmu | Binyam Ephrem Seyoum | Michael Gasser | Andreas Nürnberger
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing

We introduced the contemporary Amharic corpus, which is automatically tagged for morpho-syntactic information. Texts are collected from 25,199 documents from different domains and about 24 million orthographic words are tokenized. Since it is partly a web corpus, we made some automatic spelling error correction. We have also modified the existing morphological analyzer, HornMorpho, to use it for the automatic tagging.