HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish

Marcin Woliński, Bartłomiej Nitoń, Witold Kieraś, Jakub Szymanik


Abstract
The paper presents a tool for automatic marking up of quantifying expressions, their semantic features, and scopes. We explore the idea of using a BERT based neural model for the task (in this case HerBERT, a model trained specifically for Polish, is used). The tool is trained on a recent manually annotated Corpus of Polish Quantificational Expressions (Szymanik and Kieraś, 2022). We discuss how it performs against human annotation and present results of automatic annotation of 300 million sub-corpus of National Corpus of Polish. Our results show that language models can effectively recognise semantic category of quantification as well as identify key semantic properties of quantifiers, like monotonicity. Furthermore, the algorithm we have developed can be used for building semantically annotated quantifier corpora for other languages.
Anthology ID:
2022.lrec-1.773
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
7140–7146
Language:
URL:
https://aclanthology.org/2022.lrec-1.773
DOI:
Bibkey:
Cite (ACL):
Marcin Woliński, Bartłomiej Nitoń, Witold Kieraś, and Jakub Szymanik. 2022. HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 7140–7146, Marseille, France. European Language Resources Association.
Cite (Informal):
HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish (Woliński et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.773.pdf
Data
KLEJ