Juan-José Guzmán-Landa


2025

pdf bib
π-YALLI : un nouveau corpus pour des modèles de langue nahuatl / Yankuik nawatlahtolkorpus pampa tlahtolmachiotl
Juan-José Guzmán-Landa | Juan-Manuel Torres-Moreno | Martha Lorena Avendaño Garrido | Miguel Figueroa-Saavedra | Ligia Quintana-Torres | Graham Ranger | Carlos-Emiliano González-Gallardo | Elvys Linhares-Pontes | Patricia Velázquez-Morales | Luis-Gil Moreno-Jiménez
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux

π-YALLI : a new corpus for Nahuatl Language Models The Nahuatl is a language with few computational resources, despite the fact that it is a living language spoken by around two million people. We built π-YALLI, a corpus that enables research and development of dynamic and static Language Models (LM). We measured the perplexity of π-YALLI, evaluating state-of-the-art LM performance on a manually annotated semantic similarity corpus relative to annotator agreement. The results show the difficulty of working with this π-language, but at the same time open up interesting perspectives for the study of other NLP tasks on Nahuatl.