Proceedings of the Second International Workshop on Free/Open-Source Rule-Based Machine Translation
Extensible Dependency Grammar (XDG; Debusmann, 2007) is a flexible, modular dependency grammar framework in which sentence analyses consist of multigraphs and processing takes the form of constraint satisfaction. This paper shows how XDG lends itself to grammar-driven machine translation and introduces the machinery necessary for synchronous XDG. Since the approach relies on a shared semantics, it resembles interlingua MT. It differs in that there are no separate analysis and generation phases. Rather, translation consists of the simultaneous analysis and generation of a single source-target “sentence”.
This paper discusses the qualitative comparative evaluation performed on the results of two machine translation systems with different approaches to the processing of multi-word units. It proposes a solution for overcoming the difficulties multi-word units present to machine translation by adopting a methodology that combines the lexicon grammar approach with OpenLogos ontology and semantico-syntactic rules. The paper also discusses the importance of a qualitative evaluation metrics to correctly evaluate the performance of machine translation engines with regards to multi-word units.
There are a number of morphological analysers for Polish. Most of these, however, are non-free resources. What is more, different analysers employ different tagsets and tokenisation strategies. This situation calls for a simple and universal framework to join different sources of morphological information, including the existing resources as well as user-provided dictionaries. We present such a configurable framework that allows to write simple configuration files that define tokenisation strategies and the behaviour of morphological analysers, including simple tagset conversion.
This paper proposes to enrich RBMT dictionaries with Named Entities (NEs) automatically acquired from Wikipedia. The method is applied to the Apertium English–Spanish system and its performance compared to that of Apertium with and without handtagged NEs. The system with automatic NEs outperforms the one without NEs, while results vary when compared to a system with handtagged NEs (results are comparable for Spanish→English but slightly worst for English→Spanish). Apart from that, adding automatic NEs contributes to decreasing the amount of unknown terms by more than 10%.
This document describes a project aimed at building a new web interface to the Apertium machine translation platform, including pre-editing and post-editing environments. It contains a description of the accomplished work on this project, as well as an overview of possible future work.
This paper describes the development of a two-way shallow-transfer rulebased machine translation system between Bulgarian and Macedonian. It gives an account of the resources and the methods used for constructing the system, including the development of monolingual and bilingual dictionaries, syntactic transfer rules and constraint grammars. An evaluation of the system’s performance was carried out and compared to another commercially available MT system for the two languages. Some future work was suggested.
Softcatala` is a non-profit association created more than 10 years ago to fight the marginalisation of the Catalan language in information and communication technologies. It has led the localisation of many applications and the creation of a website which allows its users to translate texts between Spanish and Catalan using an external closedsource translation engine. Recently, the closed-source translation back-end has been replaced by a free/open-source solution completely managed by Softcatala`: the Apertium machine translation platform and the ScaleMT web service framework. Thanks to the openness of the new solution, it is possible to take advantage of the huge amount of users of the Softcatala` translation service to improve it, using a series of methods presented in this paper. In addition, a study of the translations requested by the users has been carried out, and it shows that the translation back-end change has not affected the usage patterns.
This article describes the development of an Open Source shallow-transfer machine translation system from Czech to Polish in the Apertium platform. It gives details of the methods and resources used in constructing the system. Although the resulting system has quite a high error rate, it is still competetive with other systems.
This paper presents an Italian→Catalan RBMT system automatically built by combining the linguistic data of the existing pairs Spanish–Catalan and Spanish–Italian. A lightweight manual postprocessing is carried out in order to fix inconsistencies in the automatically derived dictionaries and to add very frequent words that are missing according to a corpus analysis. The system is evaluated on the KDE4 corpus and outperforms Google Translate by approximately ten absolute points in terms of both TER and GTM.