Sakura at SemEval-2023 Task 2: Data Augmentation via Translation

Alberto Poncelas, Maksim Tkachenko, Ohnmar Htun


Abstract
We demonstrate a simple yet effective approach to augmenting training data for multilingual named entity recognition using translations. The named entity spans from the original sentences are transferred to translations via word alignment and then filtered with the baseline recognizer. The proposed approach outperforms the baseline XLM-Roberta on the multilingual dataset.
Anthology ID:
2023.semeval-1.239
Volume:
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
1718–1722
Language:
URL:
https://aclanthology.org/2023.semeval-1.239
DOI:
10.18653/v1/2023.semeval-1.239
Bibkey:
Cite (ACL):
Alberto Poncelas, Maksim Tkachenko, and Ohnmar Htun. 2023. Sakura at SemEval-2023 Task 2: Data Augmentation via Translation. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 1718–1722, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Sakura at SemEval-2023 Task 2: Data Augmentation via Translation (Poncelas et al., SemEval 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.semeval-1.239.pdf
Video:
 https://aclanthology.org/2023.semeval-1.239.mp4