Bulgarian–English Parallel Corpus for the Purposes of Creating Statistical Translation Model of the Verb Forms. General Conception, Structure, Resources and Annotation

Todor Lazarov


Abstract
This paper describes the process of creating a Bulgarian-English parallel corpus for the purposes of constructing a statistical translation model for verb forms in both languages. We briefly introduce the scientific problem behind the corpus, its main purpose, general conception, linguistic resources and annotation conception. In more details we describe the collection of language data for the purposes of creating the corpus, the preparatory processing of the gathered data, the annotation rules based on the characteristics of the gathered data and the chosen software. We discuss the current work on the training model and the future work on this linguistic resource and the aims of the scientific project.
Anthology ID:
2018.clib-1.24
Volume:
Proceedings of the Third International Conference on Computational Linguistics in Bulgaria (CLIB 2018)
Month:
May
Year:
2018
Address:
Sofia, Bulgaria
Venue:
CLIB
SIG:
Publisher:
Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences
Note:
Pages:
193–202
Language:
URL:
https://aclanthology.org/2018.clib-1.24
DOI:
Bibkey:
Cite (ACL):
Todor Lazarov. 2018. Bulgarian–English Parallel Corpus for the Purposes of Creating Statistical Translation Model of the Verb Forms. General Conception, Structure, Resources and Annotation. In Proceedings of the Third International Conference on Computational Linguistics in Bulgaria (CLIB 2018), pages 193–202, Sofia, Bulgaria. Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences.
Cite (Informal):
Bulgarian–English Parallel Corpus for the Purposes of Creating Statistical Translation Model of the Verb Forms. General Conception, Structure, Resources and Annotation (Lazarov, CLIB 2018)
Copy Citation:
PDF:
https://aclanthology.org/2018.clib-1.24.pdf