Disambiguating Grammatical Number and Gender With BERT

Annegret Janzso


Abstract
Accurately dealing with any type of ambiguity is a major task in Natural Language Processing, with great advances recently reached due to the development of context dependent language models and the use of word or sentence embeddings. In this context, our work aimed at determining how the popular language representation model BERT handle ambiguity of nouns in grammatical number and gender in different languages. We show that models trained on one specific language achieve better results for the disambiguation process than multilingual models. Also, ambiguity is generally better dealt with in grammatical number than it is in grammatical gender, reaching greater distance values from one to another in direct comparisons of individual senses. The overall results show also that the amount of data needed for training monolingual models as well as application should not be underestimated.
Anthology ID:
2021.ranlp-srw.11
Volume:
Proceedings of the Student Research Workshop Associated with RANLP 2021
Month:
September
Year:
2021
Address:
Online
Editors:
Souhila Djabri, Dinara Gimadi, Tsvetomila Mihaylova, Ivelina Nikolova-Koleva
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
69–77
Language:
URL:
https://aclanthology.org/2021.ranlp-srw.11
DOI:
Bibkey:
Cite (ACL):
Annegret Janzso. 2021. Disambiguating Grammatical Number and Gender With BERT. In Proceedings of the Student Research Workshop Associated with RANLP 2021, pages 69–77, Online. INCOMA Ltd..
Cite (Informal):
Disambiguating Grammatical Number and Gender With BERT (Janzso, RANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ranlp-srw.11.pdf