Derek Molloy
2024
CycleGN: A Cycle Consistent Approach for Neural Machine Translation
Sören Dreano
|
Derek Molloy
|
Noel Murphy
Proceedings of the Ninth Conference on Machine Translation
CycleGN is a fully self-supervised Neural Machine Translation framework relying on the Transformer architecture that does not require parallel data. Its approach is similar to a Discriminator-less CycleGAN, hence the “non-adversarial” name, specifically tailored for non-parallel text datasets. The foundational concept of our research posits that in an ideal scenario, retro-translations of generated translations should revert to the original source sentences. Consequently, a pair of models can be trained using a Cycle Consistency Loss (CCL) only, with one model translating in one direction and the second model in the opposite direction.In the context of this research, two sub-categories of non-parallel datasets are introduced. A “permuted” dataset is defined as a parallel dataset wherein the sentences of one language have been systematically rearranged. Consequently, this results in a non-parallel corpus where it is guaranteed that each sentence has a corresponding translation located at an unspecified index within the dataset. A “non-intersecting” dataset is a non-parallel dataset for which it is guaranteed that no sentence has an exact translation.Masked Language Modeling (MLM) is a pre-training strategy implemented in BERT, where a specified proportion of the input tokens are substituted with a unique $mask$ token. The objective of the neural network under this paradigm is to accurately reconstruct the original sentence from this degraded input.In inference mode, Transformers are able to generate sentences without labels. Thus, the first step is to generate pseudo-labels in inference, that are then used as labels during training. However, the models consistently converge towards a trivial solution in which the input, the generated pseudo-labels and the output are identical, achieving an optimal outcome on the CCL function, registering a value of zero. CycleGN demonstrates how MLM pre-training can be leveraged to move away from this trivial path and perform actual text translation.As a contribution to the WMT24 challenge, this study explores the efficacy of the CycleGN architectural framework in learning translation tasks across eleven language pairs under the permuted condition and four under the non-intersecting condition. Moreover, two additional language pairs from the previous WMT edition were trained and the evaluations demonstrate the robust adaptability of CycleGN in learning translation tasks.
Exploration of the CycleGN Framework for Low-Resource Languages
Sören Dreano
|
Derek Molloy
|
Noel Murphy
Proceedings of the Ninth Conference on Machine Translation
CycleGN is a Neural Machine Translation framework relying on the Transformer architecture. The foundational concept of our research posits that in an ideal scenario, retro-translations of generated translations should revert to the original source sentences. Consequently, a pair of models can be trained using a Cycle Consistency Loss only, with one model translating in one direction and the second model in the opposite direction.
2023
Tokengram_F, a Fast and Accurate Token-based chrF++ Derivative
Sören Dreano
|
Derek Molloy
|
Noel Murphy
Proceedings of the Eighth Conference on Machine Translation
Tokengram_F is an F-score-based evaluation metric for Machine Translation that is heavily in- spired by chrF++ and can act as a more accurate replacement. By replacing word n-grams with n-grams obtained from tokenization algorithms, tokengram_F better captures similarities between words.
Embed_Llama: Using LLM Embeddings for the Metrics Shared Task
Sören Dreano
|
Derek Molloy
|
Noel Murphy
Proceedings of the Eighth Conference on Machine Translation
Embed_llama is an assessment metric for language translation that hinges upon the utilization of the recently introduced Llama 2 Large Language Model (LLM), specifically, focusing on its embedding layer, with the aim of transforming sentences into a vector space that establishes connections between geometric and semantic proximities