A RAG Approach for Typological Database Completion

Jonathan Hus, Antonios Anastasopoulos


Abstract
Linguistic reference material is a trove of information that can be utilized for the analysis of languages. The material, in the form of grammar books and sketches, has been used for machine translation, but it can also be used for language analysis. Retrieval Augmented Generation (RAG) has been demonstrated to improve large language model (LLM) capabilities by incorporating external reference material into the generation process. In this paper, we investigate the use of grammar books and RAG techniques to identify language features. We use Grambank for feature definition and ground truth values, and we evaluate on five typologically diverse low-resource languages. We demonstrate that this approach can effectively make use of reference material.
Anthology ID:
2026.sigtyp-main.7
Volume:
Proceedings of the 8th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Ekaterina Vylomova, Andrei Shcherbakov, Priya Rani
Venues:
SIGTYP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
39–49
Language:
URL:
https://aclanthology.org/2026.sigtyp-main.7/
DOI:
Bibkey:
Cite (ACL):
Jonathan Hus and Antonios Anastasopoulos. 2026. A RAG Approach for Typological Database Completion. In Proceedings of the 8th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 39–49, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
A RAG Approach for Typological Database Completion (Hus & Anastasopoulos, SIGTYP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.sigtyp-main.7.pdf