Iker Manterola


2024

pdf bib
MULTILINGTOOL, Development of an Automatic Multilingual Subtitling and Dubbing System
Xabier Saralegi | Ander Corral | Igor Leturia | Xabier Sarasola | Josu Murua | Iker Manterola | Itziar Cortes
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 2)

In this paper, we present the MULTILINGTOOL project, led by the Elhuyar Foundation and funded by the European Commission under the CREA-MEDIA2022-INNOVBUSMOD call. The aim of the project is to develop an advanced platform for automatic multilingual subtitling and dubbing. It will provide support for Spanish, English, and French, as well as the co-official languages of Spain, namely Basque, Catalan, and Galician.

2012

pdf bib
Building a Basque-Chinese Dictionary by Using English as Pivot
Xabier Saralegi | Iker Manterola | Iñaki San Vicente
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Bilingual dictionaries are key resources in several fields such as translation, language learning or various NLP tasks. However, only major languages have such resources. Automatically built dictionaries by using pivot languages could be a useful resource in these circumstances. Pivot-based bilingual dictionary building is based on merging two bilingual dictionaries which share a common language (e.g. LA-LB, LB-LC) in order to create a dictionary for a new language pair (e.g LA-LC). This process may include wrong translations due to the polisemy of words. We built Basque-Chinese (Mandarin) dictionaries automatically from Basque-English and Chinese-English dictionaries. In order to prune wrong translations we used different methods adequate for less resourced languages. Inverse Consultation and Distributional Similarity methods are used because they just depend on easily available resources. Finally, we evaluated manually the quality of the built dictionaries and the adequacy of the methods. Both Inverse Consultation and Distributional Similarity provide good precision of translations but recall is seriously damaged. Distributional similarity prunes rare translations more accurately than other methods.

pdf bib
PaCo2: A Fully Automated tool for gathering Parallel Corpora from the Web
Iñaki San Vicente | Iker Manterola
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The importance of parallel corpora in the NLP field is fully acknowledged. This paper presents a tool that can build parallel corpora given just a seed word list and a pair of languages. Our approach is similar to others proposed in the literature, but introduces a new phase to the process. While most of the systems leave the task of finding websites containing parallel content up to the user, PaCo2 (Parallel Corpora Collector) takes care of that as well. The tool is language independent as far as possible, and adapting the system to work with new languages is fairly straightforward. Evaluation of the different modules has been carried out for Basque-Spanish, Spanish-English and Portuguese-English language pairs. Even though there is still room for improvement, results are positive. Results show that the corpora created have very good quality translations units, and the quality is maintained for the various language pairs. Details of the corpora created up until now are also provided.

2011

pdf bib
Analyzing Methods for Improving Precision of Pivot Based Bilingual Dictionaries
Xabier Saralegi | Iker Manterola | Iñaki San Vicente
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing