Enhancing Accessible Communication: from European Portuguese to Portuguese Sign Language

,


Introduction
According to the Portuguese Association of Deaf people1 , there are around thirty thousand deaf people that use Portuguese Sign Language (LGP) in their daily lives.However, European Portuguese (EP) and LGP are two different linguistic systems, and communication between hearing and hearingimpaired people is difficult, leading to a communication gap between both groups.There have been some attempts at developing a system that translates EP into LGP glosses (Almeida, 2014;Gaspar, 2015;Escudeiro et al., 2015;Ferreira, 2018), but the majority only present toy examples, relying on a small set of hand-crafted rules and disregarding non-manual movements.PE2LGP (Gonçalves et al., 2021;Lacerda et al., 2023) is a rule-based translation system from EP to LGP.Some of its rules are hand-crafted and some are automatically extracted from a linguistic corpus with annotations of LGP videos2 , from now on COLIN.COLIN is the only existing LGP annotated corpus (to the best of our knowledge).It currently consists of 113 hours of video recordings, with 20 of them being annotated at various linguistic levels (ELAN3 (Sloetjes and Wittenburg, 2008) was used).The videos in the corpus were recorded between 1992 and 2019 and feature hearing-impaired signers ranging from 4 to 89 years old.PE2LGP still has some limitations, as the subset of COLIN used to develop its rules is very small (three minutes) and it is necessary to perform some manual tasks to create PE2LGP translator's grammar.In this paper, we take advantage of an extended version of the corpus used in PE2LGP and present a rule-based system that is now fully automatic.We use this rule-based system to create a parallel corpus between EP and LGP, from now on LGP-5-Domain (LGP5), with text from different domains -simple sentences, social media, poetry, dialogue, and news.LGP5 is then used to fine-tune two large multilingual neural machine translation models, and evaluated on a gold collection, built by LGP experts4 .With this work, we hope to contribute to benchmarking this task.
To illustrate the task at hand, Table 1 presents examples of EP sentences and their corresponding translation into LGP glosses5 .The first row con-

Glosses in LGP
A Ana gosta de massa?
MULHER REI PRAIA IR (The queen went to the beach.)(WOMAN KING BEACH GO) tains an interrogative sentence, marked by {}(q)facial expression denoting a question that involves raising the chin, tilting the head back, and frowning.
Since "Ana" is a proper name, it should be fingerspelled -represented by DT(A-N-A).In the declarative sentence of the second row "rainha" (queen) is translated to "MULHER REI" (WOMAN KING).

Related Work
We will only consider the translation to a sign language and not from a sign language, which is out of the scope of this paper.Early automatic translation systems, such as the ones discussed in (San-Segundo et al., 2008;Zhao et al., 2000;Brour and Benabbou, 2019), adopted a rule-based approach for performing the translation between the source and the target languages.An example of an (expertdefined) rule employed in the ATLASLang MTS system (Brour and Benabbou, 2019) is shown in Equation 1.
If Gender(w i ) = f eminine Equation 1 checks if a word has a feminine gender and adds the term female after it.Notice that the involved types of grammar could vary.For instance, the work described in (Zhao et al., 2000) uses a Lexicalized Tree Adjoining Grammar to translate English words into American Sign Language glosses.
Regarding LGP, there are a few attempts at creating a translation system such as the ones described in (Escudeiro et al., 2015;Oliveira et al., 2019;Gaspar, 2015).VIRTUALSIGN (Escudeiro et al., 2015;Oliveira et al., 2019) is a system that performs bidirectional translation between Portuguese and LGP.Regarding the text-to-sign translation, the input sentence is passed through a set of grammar rules and, afterward, the system directly associates each word in the sentence with a corresponding sign stored in a database.This database contains the needed information to animate a 3D avatar.Unfortunately, no information is available regarding the creation of the translation rules and the evaluation of this system.
PE2LGP has a rule-based translation system (Gonçalves et al., 2021), an avatar to perform LGP, and a database with signs (Cabral et al., 2020;Lacerda et al., 2023).Its rules were semi-automatically extracted from COLIN.In this work, we take advantage of the current corpus and we also automate the whole process.
Another work that should be mentioned is IF2LGP (Gaspar, 2015) which consists of two modules.The first is responsible for conducting syntactic and morphological analysis, while the second contains translation rules to convert Portuguese words into LGP glosses.However, the creation of these translation rules was based on a small dataset of ten sentences.
Over the past years, significant progress has been made in the field of sign language translation, thanks to the advancements in statistical and neural machine translation.For instance, the approach proposed by San-Segundo et al. ( 2008) uses a phrase-based method, trained using parallel corpora; also, the ATLASLang NMT system (Brour and Benabbou, 2021) employs a neural machine translation approach.
Regarding LGP, a neural approach is also explored by Alves et al. (2022).It adopts a hybrid structure combining rule-based and neural machine translation approaches.According to the authors, the dataset consists of 150,000 sentences.However, the grammar and the data are not available.
3 Towards the Portuguese Sign Language

The LGP Corpus
We were given access to COLIN, and used the forty-five minutes of the corpus that had all the needed syntactic annotations to reconstruct the expert translator's grammatical rules.

Improving the Rule-based Model
The corpora-driven rule-based approach used in this translation system is depicted in Figure 1.
The first module automatically extracts linguistic information from the corpus, generating translation rules and a bilingual dictionary between EP and LGP.To do so, we extract from ELAN the Portuguese sentences, the LGP corresponding transla- tion in glosses, and their grammatical informationpart-of-speech tags, subjects, and objects.Having the information regarding the LGP sentences, we analyze the EP ones to also obtain their grammatical information.Having information from both languages, the alignment between the Portuguese words and the LGP glosses is performed.This alignment is based on an algorithm of similarity measures, string matching, and semantic similarity.From the aligned word-sign pairs, the rules and the bilingual dictionary are created.The second module uses these translation rules and the bilingual dictionary to translate EP into LGP, where the LGP sentence is represented by a sequence of glosses with markers indicating facial expressions and fingerspelled words.When an EP sentence enters the system, the sentence is analyzed and its structure is kept.Then, the distance between the Portuguese structure and the structure of the system's rules is calculated.The rule with the lower distance is the most similar one to the original sentence and is applied to convert the Portuguese structure into the LGP one.
PE2LGP has 61 general syntactic rules, some hand-crafted.Our current proposal handles 238 of such rules (218 are for declarative sentences, 7 for negative ones, and 13 for interrogatives).Eq. 2 shows an example that states that if the Portuguese sentence has the canonical order Verbal Phrase (VP) -Noun Phrase (NP), then, the LGP sentence will have the canonical order NP -VP.
V P N P → N P V P (2) PE2LGP has 90 morphosyntactic rules, and our proposal adds 228 of such rules.As an example, Eq. 3, states that, a Portuguese sentence with the syntactic structure Verb -Adjective -Noun -Adjec-tive, will be translated into an LGP sentence with the syntactic structure Verb -Noun -Noun.
This rule can be applied, for instance, to the sentence: Há grandes desenvolvimentos artísticos.
(T here are great artistic developments.) (4) The sentence presented in Eq 4 is translated into: This translation occurs because there is an entry in the bilingual dictionary that is: Additionally, as the rule presented in Eq. 3 suggests, the first adjective can now be removed -as it was merged with the first verb -the noun in the original sentence is moved to the last word in the translation sentence and the second adjective is converted into a noun -its lemma -and it is positioned in the middle of the sentence.The alignment between the rules and the source and target sentence is depicted in Fig. 2.
Figure 2: Alignment between the source sentence (Eq.4) and the target sentence (Eq.5) using the rule depicted in Eq. 3

The Neural Models
We tuned two multilingual models: the mBART model (Liu et al., 2020) and the M2M model (Fan et al., 2021).Given that the LGP glosses are written in Portuguese, both the input and output languages were set to Portuguese.As a result, a Portuguese-to-Portuguese translator was created, and fine-tuning was performed to allow the models to learn how to translate to LGP glosses.The base models and the associated weights are from the HuggingFace's Transformers package (Wolf et al., 2020).For the mBART model, we used the mbart-large-50many-to-many-mmt checkpoint and, for the M2M model, we used the m2m100_1.2Bcheckpoint.The strategy implemented to fine-tune these models was similar for both as we used the default hyperparameters except for the batch size and the number of epochs.We used a batch size of 2 for both models since it was the maximum feasible given our computational resources.We ran our model for 3 epochs since it would start to overfit if we increased the number.The employed fine-tuning data consisted of the parallel corpus created with the rule-based approach, that is the LGP5 corpus, described next.

The LGP5 Dataset
In the following, we describe the LGP5 dataset that comprises 37,500 automatically annotated sentences from 5 different domains (7,500 from each domain) that were used to train the neural models.Additionally, 200 sentences were manually annotated (gold collection), 40 from each domain.

Gathering Data
In order to have a rich collection of simple sentences, we extracted Portuguese sentences from Tatoeba6 .With the aim to expose the models to the unique linguistic characteristics prevailing in online social interactions -slang, abbreviations, and other aspects of contemporary communication commonly found on social media platforms -we used a dataset with Portuguese tweets from Kaggle7 .Poetry texts -the complete literary work written by Fernando Pessoa, a famous Portuguese poet -were also used to enable training with a broader range of sentence structures.Kaggle8 was, again, our source of data.We also considered dialogues to obtain sentences from the everyday speech of Portuguese people.For this we used the dataset described in (Csaky and Recski, 2021).Finally, the last dataset was composed of news articles9 .Training the model with sentences from this do-main not only exposes the model to an organized and coherent language but also to a broad range of topics.

The LGP5 Parallel Corpus
Our rule-based model was used to translate the gathered corpus.As a result, a new dataset comprising 37,500 EP/LGP pairs was generated (examples can be seen in Table 4, in Appendix A).

The Gold Collection
As previously said, the gold collection consists of 200 pairs of sentences EP/LGP -40 sentences from each of the five domains.These sentences were annotated by the rule-based system and given to two LGP experts.Each one validated/corrected 100 sentences.
In a preliminary experiment, we evaluated our models in a test set described in (Gonçalves et al., 2021).This test set has 58 sentences EP/LGP.We also evaluated our models against PE2LGP.Table 2 shows the results.Our rule-based approach (RB) is better than PE2LGP, and M2M is the best model overall.Next, we used our gold collection to evaluate the same models (Table 3).

PE2LGP
M2M is not always the one that obtains the highest scores, as our rule-based system is the best system in the social media dataset, for all the measures, and in some of the other domains, for some of the metrics.PE2LGP initially seemed to have the best TER score in the poetry domain.However, a closer look revealed that it failed to provide a translation for four out of the forty sentences.This affected the TER calculation, making the final scores appear higher than they actually were.Considering this, PE2LGP's scores were consistently  worse than what was initially thought, and in reality, our rule-based model had the best TER score in the poetry domain.
It is not possible to fully determine the bestperforming model.However, our rule-based model and the fine-tuned M2M stand out.Within these systems, it is clear that the performance depends on the type of sentences to be translated.Specifically, M2M achieved higher results in simpler sentences and in the domain of dialogues, whereas RB excels in the domains of social media and poetry.From this, we can infer that the M2M was able to better generalize for unseen data for simpler sentence structures.On the other hand, the rules extracted from the corpus proved to be more effective for contemporary and poetry sentence styles, aligning with the informal and formal discourse present in the videos within the corpus.
Analyzing the translations from both the rulebased and the M2M models and comparing them with the reference translation, we perceived that the majority of the errors are due to: • Words misplaced: the predicted translation includes the words present in the reference translation but in the wrong order; • Addition and removal of personal pronouns: some predicted translations wrongly add/remove personal pronouns; • Proper nouns not correctly identified: as proper nouns are fingerspelled in LGP, it is crucial for the models to identify them, which does not always occur.
• Mistakes when dealing with the female gender: some female nouns have their own sign and some are translated into MULHER (WOMAN) + MASCULINE SIGN.Some translations fail to identify the appropriate method to be followed.
In Appendix A, Table 5 demonstrates instances where our rule-based system outperforms the finetuned M2M model, while Table 6 showcases the opposite.Additionally, Table 7 in Appendix A displays sentences where both models yield identical outputs.

Conclusion and Future Work
We contribute with a fully automatic rule-based approach to translate EP to LGP and also with two neural models.We also contribute with an automatically labeled dataset (37,500 pairs EP/LGP) with texts from 5 different domains and a gold collection of 200 sentences (40 from each style).Our rulebased system and the neural M2M model share the podium in all scenarios.We benchmark, in this way, the task of EP to LGP translation.
For future work, different methods of evaluation should be used to measure fluency and adequacy.To better validate the results, the gold collection should be extended.
Ethics statement: In order to uphold the quality of our work for the end-users, that is deaf people speaking LGP, the gold set was created by LGP experts, who received certification for their work.Furthermore, both the data and code utilized in our research are accessible, in order to promote transparency and reproducibility.

Limitations
We identify the following limitations of this work: • The gold set should be extended with more sentences for each domain; • A more detailed error analysis should be conducted, analyzing the idiosyncrasies of each domain; • We should test the understandability of the sequence of glosses, even if they are not in the correct order; that is, we should test which errors are critical and which are not.

Figure 1 :
Figure 1: Pipeline of our rule-based approach.

Table 6 :
Example sentences where the fine-tuned M2M model performs better than our rule-based model when compared with the reference translation.

Table 7 :
Example sentences where our rule-based model and the fine-tuned M2M produce the same output, whether it is correct or not.