Multilinguals at SemEval-2022 Task 11: Complex NER in Semantically Ambiguous Settings for Low Resource Languages

Amit Pandey, Swayatta Daw, Narendra Unnam, Vikram Pudi


Abstract
We leverage pre-trained language models to solve the task of complex NER for two low-resource languages: Chinese and Spanish. We use the technique of Whole Word Masking (WWM) to boost the performance of masked language modeling objective on large and unsupervised corpora. We experiment with multiple neural network architectures, incorporating CRF, BiLSTMs, and Linear Classifiers on top of a fine-tuned BERT layer. All our models outperform the baseline by a significant margin and our best performing model obtains a competitive position on the evaluation leaderboard for the blind test set.
Anthology ID:
2022.semeval-1.201
Volume:
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Guy Emerson, Natalie Schluter, Gabriel Stanovsky, Ritesh Kumar, Alexis Palmer, Nathan Schneider, Siddharth Singh, Shyam Ratan
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
1469–1476
Language:
URL:
https://aclanthology.org/2022.semeval-1.201
DOI:
10.18653/v1/2022.semeval-1.201
Bibkey:
Cite (ACL):
Amit Pandey, Swayatta Daw, Narendra Unnam, and Vikram Pudi. 2022. Multilinguals at SemEval-2022 Task 11: Complex NER in Semantically Ambiguous Settings for Low Resource Languages. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 1469–1476, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Multilinguals at SemEval-2022 Task 11: Complex NER in Semantically Ambiguous Settings for Low Resource Languages (Pandey et al., SemEval 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.semeval-1.201.pdf
Code
 amitpandey-research/complex_ner
Data
MultiCoNER