Ligia Quintana-Torres
2025
π-YALLI : un nouveau corpus pour des modèles de langue nahuatl / Yankuik nawatlahtolkorpus pampa tlahtolmachiotl
Juan-José Guzmán-Landa
|
Juan-Manuel Torres-Moreno
|
Martha Lorena Avendaño Garrido
|
Miguel Figueroa-Saavedra
|
Ligia Quintana-Torres
|
Graham Ranger
|
Carlos-Emiliano González-Gallardo
|
Elvys Linhares-Pontes
|
Patricia Velázquez-Morales
|
Luis-Gil Moreno-Jiménez
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux
π-YALLI : a new corpus for Nahuatl Language Models The Nahuatl is a language with few computational resources, despite the fact that it is a living language spoken by around two million people. We built π-YALLI, a corpus that enables research and development of dynamic and static Language Models (LM). We measured the perplexity of π-YALLI, evaluating state-of-the-art LM performance on a manually annotated semantic similarity corpus relative to annotator agreement. The results show the difficulty of working with this π-language, but at the same time open up interesting perspectives for the study of other NLP tasks on Nahuatl.