Mason Shipton


2025

pdf bib
URIEL+: Enhancing Linguistic Inclusion and Usability in a Typological and Multilingual Knowledge Base
Aditya Khan | Mason Shipton | David Anugraha | Kaiyao Duan | Phuong H. Hoang | Eric Khiu | A. Seza Doğruöz | En-Shiun Annie Lee
Proceedings of the 31st International Conference on Computational Linguistics

URIEL is a knowledge base offering geographical, phylogenetic, and typological vector representations for 7970 languages. It includes distance measures between these vectors for 4005 languages, which are accessible via the lang2vec tool. Despite being frequently cited, URIEL is limited in terms of linguistic inclusion and overall usability. To tackle these challenges, we introduce URIEL+, an enhanced version of URIEL and lang2vec that addresses these limitations. In addition to expanding typological feature coverage for 2898 languages, URIEL+ improves the user experience with robust, customizable distance calculations to better suit the needs of users. These upgrades also offer competitive performance on downstream tasks and provide distances that better align with linguistic distance studies.

2024

pdf bib
Empowering the Future with Multilinguality and Language Diversity
En-Shiun Lee | Kosei Uemura | Syed Wasti | Mason Shipton
Proceedings of the Sixth Workshop on Teaching NLP

The rapid advancements and the widespread transformation of Large Language Models, have made it necessary to incorporate these cutting-edge techniques into the educational curricula of Natural Language Processing (NLP) with limited computing resources. This paper presents an applied NLP course designed for upper-year computer science undergraduate students on state-of-the-art techniques with an emphasis on multilinguality and language diversity. We hope to empower learners to advance their language community while preparing for industry.