Dual-Axis Compositional Contrastive Few-Shot Learning using Prototypes Across Linguistic and Semantic Dimensions for Indic Low-Resource Multilingual NLU

Kathakali Mitra; Sakshi Singh; Sree Nithish Reddy Gunapati; Aruna Malapati; Mark Lee

Dual-Axis Compositional Contrastive Few-Shot Learning using Prototypes Across Linguistic and Semantic Dimensions for Indic Low-Resource Multilingual NLU

Kathakali Mitra, Sakshi Singh, Sree Nithish Reddy Gunapati, Aruna Malapati, Mark G. Lee

Abstract

Multilingual Natural Language Understanding (NLU) systems often struggle to adapt when new languages or new semantic labels are introduced with only a few annotated examples. This challenge is particularly pronounced for low-resource languages, where limited supervision and evolving label spaces make conventional joint-label classification approaches unstable. Most existing multilingual NLU models treat each language-semantic pair as an independent class, entangling linguistic and semantic representations and hindering few-shot adaptation. We propose Dual-Axis Compositional Few-Shot Learning, a framework that explicitly factorizes the representation space into linguistic and semantic embedding axes, enabling independent modeling of language variation and domain-intent semantics. Joint representations are constructed compositionally through multiplicative interaction of axis-specific embeddings, allowing controlled adaptation when either the language set or the semantic label space evolves. The framework integrates factorized prototype learning, axis-structured contrastive alignment, and disentanglement regularization using HSIC-based statistical independence and Jacobian-based cross-axis decorrelation. Experiments on six low-resource Indic languages spanning Indo-Aryan and Dravidian families (Hindi, Bengali, Sanskrit, Assamese, Tamil, and Telugu) demonstrate strong performance under two structured generalization regimes. The model achieves 81.12% accuracy when adapting to few-shot languages with known semantics and 63.5% accuracy when learning new semantic classes from few-shot examples, along with an accuracy of 89.56% on known language and seen semantics. These results show that axis-factorized representations enable stable compositional generalization, offering a promising direction for scalable multilingual NLU in linguistically diverse low-resource settings.

Anthology ID:: 2026.ltedi-1.3
Volume:: Proceedings of the Sixth Workshop on Language Technology for Equality, Diversity, Inclusion
Month:: July
Year:: 2026
Address:: Virtual (Online)
Editors:: Bharathi Raja Chakravarthi, Bharathi B, Paul Buitelaar, Durairaj Thenmozhi, Miguel Ángel García Cumbreras, Salud María Jiménez Zafra
Venues:: LTEDI | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 27–36
Language:
URL:: https://aclanthology.org/2026.ltedi-1.3/
DOI:
Bibkey:
Cite (ACL):: Kathakali Mitra, Sakshi Singh, Sree Nithish Reddy Gunapati, Aruna Malapati, and Mark G. Lee. 2026. Dual-Axis Compositional Contrastive Few-Shot Learning using Prototypes Across Linguistic and Semantic Dimensions for Indic Low-Resource Multilingual NLU. In Proceedings of the Sixth Workshop on Language Technology for Equality, Diversity, Inclusion, pages 27–36, Virtual (Online). Association for Computational Linguistics.
Cite (Informal):: Dual-Axis Compositional Contrastive Few-Shot Learning using Prototypes Across Linguistic and Semantic Dimensions for Indic Low-Resource Multilingual NLU (Mitra et al., LTEDI 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.ltedi-1.3.pdf

PDF Cite Search Fix data