Blanca Calvo Figueras
Quality versus Quantity: Building Catalan-English MT Resources
Ona de Gibert Bonet | Ksenia Kharitonova | Blanca Calvo Figueras | Jordi Armengol-Estapé | Maite Melero
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
In this work, we make the case of quality over quantity when training a MT system for a medium-to-low-resource language pair, namely Catalan-English. We compile our training corpus out of existing resources of varying quality and a new high-quality corpus. We also provide new evaluation translation datasets in three different domains. In the process of building Catalan-English parallel resources, we evaluate the impact of drastically filtering alignments in the resulting MT engines. Our results show that even when resources are limited, as in this case, it is worth filtering for quality. We further explore the cross-lingual transfer learning capabilities of the proposed model for parallel corpus filtering by applying it to other languages. All resources generated in this work are released under open license to encourage the development of language technology in Catalan.
A Semantics-Aware Approach to Automated Claim Verification
Blanca Calvo Figueras | Montse Oller | Rodrigo Agerri
Proceedings of the Fifth Fact Extraction and VERification Workshop (FEVER)
The influence of fake news in the perception of reality has become a mainstream topic in the last years due to the fast propagation of misleading information. In order to help in the fight against misinformation, automated solutions to fact-checking are being actively developed within the research community. In this context, the task of Automated Claim Verification is defined as assessing the truthfulness of a claim by finding evidence about its veracity. In this work we empirically demonstrate that enriching a BERT model with explicit semantic information such as Semantic Role Labelling helps to improve results in claim verification as proposed by the FEVER benchmark. Furthermore, we perform a number of explainability tests that suggest that the semantically-enriched model is better at handling complex cases, such as those including passive forms or multiple propositions.
- Ona de Gibert Bonet 1
- Ksenia Kharitonova 1
- Jordi Armengol-Estapé 1
- Maite Melero 1
- Montse Oller 1
- show all...