Pavel Mihaylov


2020

pdf bib
Developing a Twi (Asante) Dictionary from Akan Interlinear Glossed Texts
Dorothee Beermann | Lars Hellan | Pavel Mihaylov | Anna Struck
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

Traditionally, a lexicographer identifies the lexical items to be added to a dictionary. Here we present a corpus-based approach to dictionary compilation and describe a procedure that derives a Twi dictionary from a TypeCraft corpus of Interlinear Glossed Texts. We first extracted a list of unique words. We excluded words belonging to different dialects of Akan (mostly Fante and Abron). We corrected misspellings and distinguished English loan words to be integrated in our dictionary from instances of code switching. Next to the dictionary itself, one other resource arising from our work is a lexicographical model for Akan which represents the lexical resource itself, and the extended morphological and word class inventories that provide information to be aggregated. We also represent external resources such as the corpus that serves as the source and word level audio files. The Twi dictionary consists at present of 1367 words; it will be available online and from an open mobile app.

2011

pdf bib
e-Research for Linguists
Dorothee Beermann | Pavel Mihaylov
Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

2010

pdf bib
Cloud Computing for Linguists
Dorothee Beermann | Pavel Mihaylov
Coling 2010: Demonstrations

2009

pdf bib
Interlinear Glossing and its Role in Theoretical and Descriptive Studies of African and other Lesser–Documented Languages
Dorothee Beermann | Pavel Mihaylov
Proceedings of the First Workshop on Language Technologies for African Languages