Phuong Hanh Hoang
2025
Less is More: The Effectiveness of Compact Typological Language Representations
York Hay Ng
|
Phuong Hanh Hoang
|
En-Shiun Annie Lee
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Linguistic feature datasets such as URIEL+ are valuable for modelling cross-lingual relationships, but their high dimensionality and sparsity, especially for low-resource languages, limit the effectiveness of distance metrics. We propose a pipeline to optimize the URIEL+ typological feature space by combining feature selection and imputation, producing compact yet interpretable typological representations. We evaluate these feature subsets on linguistic distance alignment and downstream tasks, demonstrating that reduced-size representations of language typology can yield more informative distance metrics and improve performance in multilingual NLP applications.