A Web Tool for Building Parallel Corpora of Spoken and Sign Languages

Alex Becker, Fabio Kepler, Sara Candeias


Abstract
In this paper we describe our work in building an online tool for manually annotating texts in any spoken language with SignWriting in any sign language. The existence of such tool will allow the creation of parallel corpora between spoken and sign languages that can be used to bootstrap the creation of efficient tools for the Deaf community. As an example, a parallel corpus between English and American Sign Language could be used for training Machine Learning models for automatic translation between the two languages. Clearly, this kind of tool must be designed in a way that it eases the task of human annotators, not only by being easy to use, but also by giving smart suggestions as the annotation progresses, in order to save time and effort. By building a collaborative, online, easy to use annotation tool for building parallel corpora between spoken and sign languages we aim at helping the development of proper resources for sign languages that can then be used in state-of-the-art models currently used in tools for spoken languages. There are several issues and difficulties in creating this kind of resource, and our presented tool already deals with some of them, like adequate text representation of a sign and many to many alignments between words and signs.
Anthology ID:
L16-1229
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1438–1445
Language:
URL:
https://aclanthology.org/L16-1229
DOI:
Bibkey:
Cite (ACL):
Alex Becker, Fabio Kepler, and Sara Candeias. 2016. A Web Tool for Building Parallel Corpora of Spoken and Sign Languages. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1438–1445, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
A Web Tool for Building Parallel Corpora of Spoken and Sign Languages (Becker et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1229.pdf
Code
 unipampa/signcorpus