Viorica-Camelia Lupancu


2023

pdf bib
FII_Better at SemEval-2023 Task 2: MultiCoNER II Multilingual Complex Named Entity Recognition
Viorica-Camelia Lupancu | Alexandru-Gabriel Platica | Cristian-Mihai Rosu | Daniela Gifu | Diana Trandabat
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This task focuses on identifying complex named entities (NEs) in several languages. In the context of SemEval-2023 competition, our team presents an exploration of a base transformer model’s capabilities regarding the task, focused more specifically on five languages (English, Spanish, Swedish, German, Italian). We take DistilBERT and BERT as two examples of basic transformer models, using DistilBERT as a baseline and BERT as the platform to create an improved model. The dataset that we are using, MultiCoNER II, is a large multilingual dataset used for NER, that covers domains like: Wiki sentences, questions and search queries across 12 languages. This dataset contains 26M tokens and it is assembled from public resources. MultiCoNER II defines a NER tag-set with 6 classes and 67 tags. We have managed to get moderate results in the English track (we ranked 17th out of 34), while our results in the other tracks could be further improved in the future (overall third to last).