Less is More: The Effectiveness of Compact Typological Language Representations

York Hay Ng, Phuong Hanh Hoang, En-Shiun Annie Lee


Abstract
Linguistic feature datasets such as URIEL+ are valuable for modelling cross-lingual relationships, but their high dimensionality and sparsity, especially for low-resource languages, limit the effectiveness of distance metrics. We propose a pipeline to optimize the URIEL+ typological feature space by combining feature selection and imputation, producing compact yet interpretable typological representations. We evaluate these feature subsets on linguistic distance alignment and downstream tasks, demonstrating that reduced-size representations of language typology can yield more informative distance metrics and improve performance in multilingual NLP applications.
Anthology ID:
2025.emnlp-main.1310
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25816–25827
Language:
URL:
https://aclanthology.org/2025.emnlp-main.1310/
DOI:
Bibkey:
Cite (ACL):
York Hay Ng, Phuong Hanh Hoang, and En-Shiun Annie Lee. 2025. Less is More: The Effectiveness of Compact Typological Language Representations. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 25816–25827, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Less is More: The Effectiveness of Compact Typological Language Representations (Ng et al., EMNLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.emnlp-main.1310.pdf
Checklist:
 2025.emnlp-main.1310.checklist.pdf