FII_Better at SemEval-2023 Task 2: MultiCoNER II Multilingual Complex Named Entity Recognition

Viorica-Camelia Lupancu, Alexandru-Gabriel Platica, Cristian-Mihai Rosu, Daniela Gifu, Diana Trandabat


Abstract
This task focuses on identifying complex named entities (NEs) in several languages. In the context of SemEval-2023 competition, our team presents an exploration of a base transformer model’s capabilities regarding the task, focused more specifically on five languages (English, Spanish, Swedish, German, Italian). We take DistilBERT and BERT as two examples of basic transformer models, using DistilBERT as a baseline and BERT as the platform to create an improved model. The dataset that we are using, MultiCoNER II, is a large multilingual dataset used for NER, that covers domains like: Wiki sentences, questions and search queries across 12 languages. This dataset contains 26M tokens and it is assembled from public resources. MultiCoNER II defines a NER tag-set with 6 classes and 67 tags. We have managed to get moderate results in the English track (we ranked 17th out of 34), while our results in the other tracks could be further improved in the future (overall third to last).
Anthology ID:
2023.semeval-1.153
Volume:
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
1107–1113
Language:
URL:
https://aclanthology.org/2023.semeval-1.153
DOI:
10.18653/v1/2023.semeval-1.153
Bibkey:
Cite (ACL):
Viorica-Camelia Lupancu, Alexandru-Gabriel Platica, Cristian-Mihai Rosu, Daniela Gifu, and Diana Trandabat. 2023. FII_Better at SemEval-2023 Task 2: MultiCoNER II Multilingual Complex Named Entity Recognition. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 1107–1113, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
FII_Better at SemEval-2023 Task 2: MultiCoNER II Multilingual Complex Named Entity Recognition (Lupancu et al., SemEval 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.semeval-1.153.pdf