On a Hurtlex Resource for Bulgarian

Petya Osenova


Abstract
The paper reports on the cleaning of the Hurtlex lexicon for Bulgarian as part of the multilingual Hurtlex resource. All the challenges during the cleaning process are presented, such as: deleting strings or lexica that are clear errors from the automatic translation, establishing criteria for keeping or discarding a lexeme based on its meaning and potential usages, contextualizing the lexeme with the meaning through an example, etc. In addition, the paper discusses the mapping of the offensive lexica to the BTB-Wordnet as well as the system that has been used.
Anthology ID:
2024.clib-1.23
Volume:
Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024)
Month:
September
Year:
2024
Address:
Sofia, Bulgaria
Venue:
CLIB
SIG:
Publisher:
Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences
Note:
Pages:
214–219
Language:
URL:
https://aclanthology.org/2024.clib-1.23
DOI:
Bibkey:
Cite (ACL):
Petya Osenova. 2024. On a Hurtlex Resource for Bulgarian. In Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024), pages 214–219, Sofia, Bulgaria. Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences.
Cite (Informal):
On a Hurtlex Resource for Bulgarian (Osenova, CLIB 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clib-1.23.pdf