Stig-Arne Grönroos


2022

pdf bib
Latest Development in the FoTran Project – Scaling Up Language Coverage in Neural Machine Translation Using Distributed Training with Language-Specific Components
Raúl Vázquez | Michele Boggia | Alessandro Raganato | Niki A. Loppi | Stig-Arne Grönroos | Jörg Tiedemann
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

We describe the enhancement of a multilingual NMT toolkit developed as part of the FoTran project. We devise our modular attention-bridge model, which connects language-specific components through a shared network layer. The system now supports distributed training over many nodes and GPUs in order to substantially scale up the number of languages that can be included in a modern neural translation architecture. The model enables the study of emerging language-agnostic representations and also provides a modular toolkit for efficient machine translation.

2020

pdf bib
The University of Helsinki and Aalto University submissions to the WMT 2020 news and low-resource translation tasks
Yves Scherrer | Stig-Arne Grönroos | Sami Virpioja
Proceedings of the Fifth Conference on Machine Translation

This paper describes the joint participation of University of Helsinki and Aalto University to two shared tasks of WMT 2020: the news translation between Inuktitut and English and the low-resource translation between German and Upper Sorbian. For both tasks, our efforts concentrate on efficient use of monolingual and related bilingual corpora with scheduled multi-task learning as well as an optimized subword segmentation with sampling. Our submission obtained the highest score for Upper Sorbian -> German and was ranked second for German -> Upper Sorbian according to BLEU scores. For English–Inuktitut, we reached ranks 8 and 10 out of 11 according to BLEU scores.

pdf bib
Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning
Stig-Arne Grönroos | Sami Virpioja | Mikko Kurimo
Proceedings of the 12th Language Resources and Evaluation Conference

Data-driven segmentation of words into subword units has been used in various natural language processing applications such as automatic speech recognition and statistical machine translation for almost 20 years. Recently it has became more widely adopted, as models based on deep neural networks often benefit from subword units even for morphologically simpler languages. In this paper, we discuss and compare training algorithms for a unigram subword model, based on the Expectation Maximization algorithm and lexicon pruning. Using English, Finnish, North Sami, and Turkish data sets, we show that this approach is able to find better solutions to the optimization problem defined by the Morfessor Baseline model than its original recursive training algorithm. The improved optimization also leads to higher morphological segmentation accuracy when compared to a linguistic gold standard. We publish implementations of the new algorithms in the widely-used Morfessor software package.

2019

pdf bib
North Sámi morphological segmentation with low-resource semi-supervised sequence labeling
Stig-Arne Grönroos | Sami Virpioja | Mikko Kurimo
Proceedings of the Fifth International Workshop on Computational Linguistics for Uralic Languages

2018

pdf bib
Cognate-aware morphological segmentation for multilingual neural translation
Stig-Arne Grönroos | Sami Virpioja | Mikko Kurimo
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This article describes the Aalto University entry to the WMT18 News Translation Shared Task. We participate in the multilingual subtrack with a system trained under the constrained condition to translate from English to both Finnish and Estonian. The system is based on the Transformer model. We focus on improving the consistency of morphological segmentation for words that are similar orthographically, semantically, and distributionally; such words include etymological cognates, loan words, and proper names. For this, we introduce Cognate Morfessor, a multilingual variant of the Morfessor method. We show that our approach improves the translation quality particularly for Estonian, which has less resources for training the translation model.

pdf bib
The WMT’18 Morpheval test suites for English-Czech, English-German, English-Finnish and Turkish-English
Franck Burlot | Yves Scherrer | Vinit Ravishankar | Ondřej Bojar | Stig-Arne Grönroos | Maarit Koponen | Tommi Nieminen | François Yvon
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

Progress in the quality of machine translation output calls for new automatic evaluation procedures and metrics. In this paper, we extend the Morpheval protocol introduced by Burlot and Yvon (2017) for the English-to-Czech and English-to-Latvian translation directions to three additional language pairs, and report its use to analyze the results of WMT 2018’s participants for these language pairs. Considering additional, typologically varied source and target languages also enables us to draw some generalizations regarding this morphology-oriented evaluation procedure.

pdf bib
The MeMAD Submission to the WMT18 Multimodal Translation Task
Stig-Arne Grönroos | Benoit Huet | Mikko Kurimo | Jorma Laaksonen | Bernard Merialdo | Phu Pham | Mats Sjöberg | Umut Sulubacak | Jörg Tiedemann | Raphael Troncy | Raúl Vázquez
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neural machine translation (NMT) architecture to a multi-modal setting. In this paper, we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top scoring system for both English-to-German and English-to-French, according to the automatic metrics for flickr18. Our experiments show that the effect of the visual features in our system is small. Our largest gains come from the quality of the underlying text-only NMT system. We find that appropriate use of additional data is effective.

2017

pdf bib
Extending hybrid word-character neural machine translation with multi-task learning of morphological analysis
Stig-Arne Grönroos | Sami Virpioja | Mikko Kurimo
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
A Comparative Study of Minimally Supervised Morphological Segmentation
Teemu Ruokolainen | Oskar Kohonen | Kairit Sirts | Stig-Arne Grönroos | Mikko Kurimo | Sami Virpioja
Computational Linguistics, Volume 42, Issue 1 - March 2016

pdf bib
Hybrid Morphological Segmentation for Phrase-Based Machine Translation
Stig-Arne Grönroos | Sami Virpioja | Mikko Kurimo
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2015

pdf bib
Tuning Phrase-Based Segmented Translation for a Morphologically Complex Target Language
Stig-Arne Grönroos | Sami Virpioja | Mikko Kurimo
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
LeBLEU: N-gram-based Translation Evaluation Score for Morphologically Complex Languages
Sami Virpioja | Stig-Arne Grönroos
Proceedings of the Tenth Workshop on Statistical Machine Translation

2014

pdf bib
Morfessor 2.0: Toolkit for statistical morphological segmentation
Peter Smit | Sami Virpioja | Stig-Arne Grönroos | Mikko Kurimo
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Morfessor FlatCat: An HMM-Based Method for Unsupervised and Semi-Supervised Learning of Morphology
Stig-Arne Grönroos | Sami Virpioja | Peter Smit | Mikko Kurimo
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers