Automatic Animacy Classification for Romanian Nouns

Maria Tepei, Jelke Bloem


Abstract
We introduce the first Romanian animacy classifier, specifically a type-based binary classifier of Romanian nouns into the classes human/non-human, using pre-trained word embeddings and animacy information derived from Romanian WordNet. By obtaining a seed set of labeled nouns and their embeddings, we are able to train classifiers that generalize to unseen nouns. We compare three different architectures and observe good performance on classifying word types. In addition, we manually annotate a small corpus for animacy to perform a token-based evaluation of Romanian animacy classification in a naturalistic setting, which reveals limitations of the type-based classification approach.
Anthology ID:
2024.lrec-main.163
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
1825–1831
Language:
URL:
https://aclanthology.org/2024.lrec-main.163
DOI:
Bibkey:
Cite (ACL):
Maria Tepei and Jelke Bloem. 2024. Automatic Animacy Classification for Romanian Nouns. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 1825–1831, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Automatic Animacy Classification for Romanian Nouns (Tepei & Bloem, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.163.pdf