Meta-Pretraining for Zero-Shot Cross-Lingual Named Entity Recognition in Low-Resource Philippine Languages

David Demitri Africa; Suchir Salhan; Yuval Weiss; Paula Buttery; Richard Diehl Martinez

doi:10.18653/v1/2025.mrl-main.8

Meta-Pretraining for Zero-Shot Cross-Lingual Named Entity Recognition in Low-Resource Philippine Languages

David Demitri Africa, Suchir Salhan, Yuval Weiss, Paula Buttery, Richard Diehl Martinez

Abstract

Named-entity recognition (NER) in low-resource languages is usually tackled by finetuning very large multilingual LMs, an option that is often infeasible in memory- or latency-constrained settings. We ask whether small decoder LMs can be pretrained so that they adapt quickly and transfer zero-shot to languages unseen during pretraining. To this end we replace part of the autoregressive objective with first-order model-agnostic meta-learning (MAML). Tagalog and Cebuano are typologically similar yet structurally different in their actor/non-actor voice systems, and hence serve as a challenging test-bed. Across four model sizes (11 M – 570 M) MAML lifts zero-shot micro-F1 by 2–6 pp under head-only tuning and 1–3 pp after full tuning, while cutting convergence time by up to 8%. Gains are largest for single-token person entities that co-occur with Tagalog case particles si/ni, highlighting the importance of surface anchors.

Anthology ID:: 2025.mrl-main.8
Volume:: Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)
Month:: November
Year:: 2025
Address:: Suzhuo, China
Editors:: David Ifeoluwa Adelani, Catherine Arnett, Duygu Ataman, Tyler A. Chang, Hila Gonen, Rahul Raja, Fabian Schmidt, David Stap, Jiayi Wang
Venues:: MRL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 106–127
Language:
URL:: https://aclanthology.org/2025.mrl-main.8/
DOI:: 10.18653/v1/2025.mrl-main.8
Bibkey:
Cite (ACL):: David Demitri Africa, Suchir Salhan, Yuval Weiss, Paula Buttery, and Richard Diehl Martinez. 2025. Meta-Pretraining for Zero-Shot Cross-Lingual Named Entity Recognition in Low-Resource Philippine Languages. In Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025), pages 106–127, Suzhuo, China. Association for Computational Linguistics.
Cite (Informal):: Meta-Pretraining for Zero-Shot Cross-Lingual Named Entity Recognition in Low-Resource Philippine Languages (Africa et al., MRL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.mrl-main.8.pdf

PDF Cite Search Fix data