Theoretical Linguistics Rivals Embeddings in Language Clustering for Multilingual Named Entity Recognition

Sakura Imai, Daisuke Kawahara, Naho Orita, Hiromune Oda


Abstract
While embedding-based methods have been dominant in language clustering for multilingual tasks, clustering based on linguistic features has not yet been explored much, as it remains baselines (Tan et al., 2019; Shaffer, 2021). This study investigates whether and how theoretical linguistics improves language clustering for multilingual named entity recognition (NER). We propose two types of language groupings: one based on morpho-syntactic features in a nominal domain and one based on a head parameter. Our NER experiments show that the proposed methods largely outperform a state-of-the-art embedding-based model, suggesting that theoretical linguistics plays a significant role in multilingual learning tasks.
Anthology ID:
2023.acl-srw.24
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Vishakh Padmakumar, Gisela Vallejo, Yao Fu
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
139–151
Language:
URL:
https://aclanthology.org/2023.acl-srw.24
DOI:
10.18653/v1/2023.acl-srw.24
Bibkey:
Cite (ACL):
Sakura Imai, Daisuke Kawahara, Naho Orita, and Hiromune Oda. 2023. Theoretical Linguistics Rivals Embeddings in Language Clustering for Multilingual Named Entity Recognition. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 139–151, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Theoretical Linguistics Rivals Embeddings in Language Clustering for Multilingual Named Entity Recognition (Imai et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-srw.24.pdf