Libras-UFPel Corpus: A Parallel Dataset of Brazilian Sign Language and Portuguese for Multimodal Research and Processing

Antonielle Martins; Brenda S. Santana; Francielle Martins; Tatiana Lebedeff; Darley Nunes; Luisa Bohm

Libras-UFPel Corpus: A Parallel Dataset of Brazilian Sign Language and Portuguese for Multimodal Research and Processing

Antonielle Martins, Brenda S. Santana, Francielle Martins, Tatiana Lebedeff, Darley Nunes, Luisa Bohm

Abstract

The Libras-UFPel Corpus is a multimodal, multilayer parallel resource designed for the documentation and computational analysis of Brazilian Sign Language (Libras) in systematic alignment with written Portuguese. By integrating controlled recordings with naturalistic data from the Inventário Nacional de Libras-Pelotas, the corpus ensures interoperability through shared methodological standards. The dataset currently comprises 4,800 controlledaudiovisual records (2,400 sentences and 2,400 isolated signs) fully paired with Portuguese translations, supplemented by approximately 10 hours of spontaneous interaction from threenew naturalistic interviews, currently in the editing phase. To date, 1,200 controlled sentences have been lemmatized, gloss-annotatedand translated, providing a structured parallel subset for Libras-to-Portuguese Sign Language Processing tasks such as recognition and machine translation. The annotation model follows a hierarchical structure covering lexical, partially lexical, and non-lexical signs, including independent tiers for non-manual markers. By bridging descriptive linguistics and Natural Language Processing, Libras-UFPel Corpus serves as a reference source for bilingual data-driven modeling, advancing digital inclusion and linguistic accessibility.

Anthology ID:: 2026.propor-1.112
Volume:: Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Month:: April
Year:: 2026
Address:: Salvador, Brazil
Editors:: Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
Venue:: PROPOR
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1068–1073
Language:
URL:: https://aclanthology.org/2026.propor-1.112/
DOI:
Bibkey:
Cite (ACL):: Antonielle Martins, Brenda S. Santana, Francielle Martins, Tatiana Lebedeff, Darley Nunes, and Luisa Bohm. 2026. Libras-UFPel Corpus: A Parallel Dataset of Brazilian Sign Language and Portuguese for Multimodal Research and Processing. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1, pages 1068–1073, Salvador, Brazil. Association for Computational Linguistics.
Cite (Informal):: Libras-UFPel Corpus: A Parallel Dataset of Brazilian Sign Language and Portuguese for Multimodal Research and Processing (Martins et al., PROPOR 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.propor-1.112.pdf

PDF Cite Search Fix data