Sartipi-Sedighin at SemEval-2023 Task 2: Fine-grained Named Entity Recognition with Pre-trained Contextual Language Models and Data Augmentation from Wikipedia

Amir Sartipi, Amirreza Sedighin, Afsaneh Fatemi, Hamidreza Baradaran Kashani


Abstract
This paper presents the system developed by the Sartipi-Sedighin team for SemEval 2023 Task 2, which is a shared task focused on multilingual complex named entity recognition (NER), or MultiCoNER II. The goal of this task is to identify and classify complex named entities (NEs) in text across multiple languages. To tackle the MultiCoNER II task, we leveraged pre-trained language models (PLMs) fine-tuned for each language included in the dataset. In addition, we also applied a data augmentation technique to increase the amount of training data available to our models. Specifically, we searched for relevant NEs that already existed in the training data within Wikipedia, and we added new instances of these entities to our training corpus. Our team achieved an overall F1 score of 61.25% in the English track and 71.79% in the multilingual track across all 13 tracks of the shared task that we submitted to.
Anthology ID:
2023.semeval-1.78
Volume:
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
565–579
Language:
URL:
https://aclanthology.org/2023.semeval-1.78
DOI:
10.18653/v1/2023.semeval-1.78
Bibkey:
Cite (ACL):
Amir Sartipi, Amirreza Sedighin, Afsaneh Fatemi, and Hamidreza Baradaran Kashani. 2023. Sartipi-Sedighin at SemEval-2023 Task 2: Fine-grained Named Entity Recognition with Pre-trained Contextual Language Models and Data Augmentation from Wikipedia. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 565–579, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Sartipi-Sedighin at SemEval-2023 Task 2: Fine-grained Named Entity Recognition with Pre-trained Contextual Language Models and Data Augmentation from Wikipedia (Sartipi et al., SemEval 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.semeval-1.78.pdf
Video:
 https://aclanthology.org/2023.semeval-1.78.mp4