Basic Language Resources for 31 Languages (Plus English): The LORELEI Representative and Incident Language Packs

Jennifer Tracey, Stephanie Strassel


Abstract
This paper documents and describes the thirty-one basic language resource packs created for the DARPA LORELEI program for use in development and testing of systems capable of providing language-independent situational awareness in emerging scenarios in a low resource language context. Twenty-four Representative Language Packs cover a broad range of language families and typologies, providing large volumes of monolingual and parallel text, smaller volumes of entity and semantic annotations, and a variety of grammatical resources and tools designed to support research into language universals and cross-language transfer. Seven Incident Language Packs provide test data to evaluate system capabilities on a previously unseen low resource language. We discuss the makeup of Representative and Incident Language Packs, the methods used to produce them, and the evolution of their design and implementation over the course of the multi-year LORELEI program. We conclude with a summary of the final language packs including their low-cost publication in the LDC catalog.
Anthology ID:
2020.sltu-1.39
Volume:
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Dorothee Beermann, Laurent Besacier, Sakriani Sakti, Claudia Soria
Venue:
SLTU
SIG:
Publisher:
European Language Resources association
Note:
Pages:
277–284
Language:
English
URL:
https://aclanthology.org/2020.sltu-1.39
DOI:
Bibkey:
Cite (ACL):
Jennifer Tracey and Stephanie Strassel. 2020. Basic Language Resources for 31 Languages (Plus English): The LORELEI Representative and Incident Language Packs. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pages 277–284, Marseille, France. European Language Resources association.
Cite (Informal):
Basic Language Resources for 31 Languages (Plus English): The LORELEI Representative and Incident Language Packs (Tracey & Strassel, SLTU 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.sltu-1.39.pdf