2022
pdf
bib
abs
Domain Adaptation in Neural Machine Translation using a Qualia-Enriched FrameNet
Alexandre Diniz da Costa
|
Mateus Coutinho Marim
|
Ely Matos
|
Tiago Timponi Torrent
Proceedings of the Thirteenth Language Resources and Evaluation Conference
In this paper we present Scylla, a methodology for domain adaptation of Neural Machine Translation (NMT) systems that make use of a multilingual FrameNet enriched with qualia relations as an external knowledge base. Domain adaptation techniques used in NMT usually require fine-tuning and in-domain training data, which may pose difficulties for those working with lesser-resourced languages and may also lead to performance decay of the NMT system for out-of-domain sentences. Scylla does not require fine-tuning of the NMT model, avoiding the risk of model over-fitting and consequent decrease in performance for out-of-domain translations. Two versions of Scylla are presented: one using the source sentence as input, and another one using the target sentence. We evaluate Scylla in comparison to a state-of-the-art commercial NMT system in an experiment in which 50 sentences from the Sports domain are translated from Brazilian Portuguese to English. The two versions of Scylla significantly outperform the baseline commercial system in HTER.
2020
pdf
bib
abs
Frame-Based Annotation of Multimodal Corpora: Tracking (A)Synchronies in Meaning Construction
Frederico Belcavello
|
Marcelo Viridiano
|
Alexandre Diniz da Costa
|
Ely Edison da Silva Matos
|
Tiago Timponi Torrent
Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet
Multimodal aspects of human communication are key in several applications of Natural Language Processing, such as Machine Translation and Natural Language Generation. Despite recent advances in integrating multimodality into Computational Linguistics, the merge between NLP and Computer Vision techniques is still timid, especially when it comes to providing fine-grained accounts for meaning construction. This paper reports on research aiming to determine appropriate methodology and develop a computational tool to annotate multimodal corpora according to a principled structured semantic representation of events, relations and entities: FrameNet. Taking a Brazilian television travel show as corpus, a pilot study was conducted to annotate the frames that are evoked by the audio and the ones that are evoked by visual elements. We also implemented a Multimodal Annotation tool which allows annotators to choose frames and locate frame elements both in the text and in the images, while keeping track of the time span in which those elements are active in each modality. Results suggest that adding a multimodal domain to the linguistic layer of annotation and analysis contributes both to enrich the kind of information that can be tagged in a corpus, and to enhance FrameNet as a model of linguistic cognition.
2019
pdf
bib
abs
Designing a Frame-Semantic Machine Translation Evaluation Metric
Oliver Czulo
|
Tiago Timponi Torrent
|
Ely Edison da Silva Matos
|
Alexandre Diniz da Costa
|
Debanjana Kar
Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019)
We propose a metric for machine translation evaluation based on frame semantics which does not require the use of reference translations or human corrections, but is aimed at comparing original and translated output directly. The metrics is described on the basis of an existing manual frame-semantic annotation of a parallel corpus with an English original and a Brazilian Portuguese and a German translation. We discuss implications of our metrics design, including the potential of scaling it for multiple languages.