Caro Brosens


pdf bib
GoSt-ParC-Sign: Gold Standard Parallel Corpus of Sign and spoken language
Mirella De Sisto | Vincent Vandeghinste | Lien Soetemans | Caro Brosens | Dimitar Shterionov
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

Good quality training data for Sign Language Machine Translation (SLMT) is extremely scarce, and this is one of the challenges that any project focusing on Machine Translation (MT) which also targets sign languages is currently facing. The goal of this ongoing project is to create a parallel corpus of authentic Flemish Sign Language (VGT) and written Dutch which can be employed as gold standard in automated sign language translation. The availability of a gold standard corpus like Gost-ParC-Sign can facilitate the advances of SLMT; consequently, it supports and promotes inclusiveness in MT and, on a more general level, in language technology


pdf bib
Moving towards a Functional Approach in the Flemish Sign Language Dictionary Making Process
Caro Brosens | Margot Janssens | Sam Verstraete | Thijs Vandamme | Hannes De Durpel
Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources

This presentation will outline the dictionary making process of the new online Flemish Sign Language dictionary launched in 2019. First some necessary background information is provided, consisting of a brief history of Flemish Sign Language (VGT) lexicography. Then three phases in the development of the renewed dictionary of VGT will be explored: (i) user research, (ii) data-cleaning and modeling, and (iii) innovations. More than wanting to project a report of lexicographic research on a website, the goal was to make the new dictionary a practical, user-friendly reference tool that meets the needs, expectations, and skills of the dictionary users. To gain a better understanding of who the users were, several sources were consulted: the user research by Joni Oyserman (2013), the quantitative data from Google Analytics and VGTC’s own user profiles. Since 2017, VGTC has been using Signbank, an electronic database specifically developed to compile and manage lexicographic data for sign languages. Bringing together all this raw data inadvertently led to inconsistencies and small mistakes, therefore the data had to be manually revised and complemented. The VGT dictionary was mainly formally modernized, but there are also several substantive differences regarding the previous dictionary: for instance, search options were expanded, and semantic categories were added as well as a new feedback feature. In addition, the new website is also structurally different, it is now responsive to all screen sizes. Lastly, possible future innovations will briefly be discussed. VGTC aims to continuously improve both the user-based interface and the content of the current dictionary. Future goals include, but are not limited to, adding definitions and sample sentences (preferably extracted from the corpus), as well as information on the etymology and common use of signs.