Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation
This paper describes Apertium: a free/open-source machine translation platform (engine, toolbox and data), its history, its philosophy of design, its technology, the community of developers, the research and business based on it, and its prospects and challenges, now that it is five years old.
This paper describes some of the issues found when adapting and extending the Matxin free-software machine translation system to other language pairs. It sketches out some of the characteristics of Matxin and offers some possible solutions to these issues.
This paper describes OpenLogos, a rule-driven machine translation system, and the syntactic-semantic taxonomy SAL that underlies this system. We illustrate how SAL addresses typical problems relating to source language analysis and target language synthesis. The adaptation of OpenLogos resources to a specific application concerning paraphrasing in Portuguese is also described here. References are provided for access to OpenLogos and to SAL.
This article describes the development of a shallow-transfer machine translation system from Swedish to Danish in the Apertium platform. It gives details of the resources used, the methods for constructing the system and an evaluation of the translation quality. The quality is found to be comparable with that of current commercial systems, despite the particularly low coverage of the lexicons.
We describe the development of a two-way shallow-transfer machine translation system between Norwegian Nynorsk and Norwegian Bokma ̊l built on the Apertium platform, using the Free and Open Source resources Norsk Ordbank and the Oslo–Bergen Constraint Grammar tagger. We detail the integration of these and other resources in the system along with the construction of the lexical and structural transfer, and evaluate the translation quality in comparison with another system. Finally, some future work is suggested.
This article describes the development of an open-source morphological analyser for Bengali Language using nitestate technology. First we discuss the challenges of creating a morphological analyser for a highly inectional language like Bengali and then propose a solution to that using lttoolbox, an open-source nite-state toolkit. We then evaluate the performance of our developed system and propose ways of improving it further.
Some machine translation services like Google Ajax Language API have become very popular as they make the collaboratively created contents of the web 2.0 available to speakers of many languages. One of the keys of its success is its clear and easy-to-use application programming interface (API) and a scalable and reliable service. This paper describes a highly scalable implementation of an Apertium-based translation web service, that aims to make contents available to speakers of lesser resourced languages. The API of this service is compatible with Google’s one, and the scalability of the system is achieved by a new architecture that allows adding or removing new servers at any time; for that, an application placement algorithm which decides which language pairs should be translated on which servers is designed. Our experiments show how the resulting architecture improves the translation rate in comparison to existing Apertium-based servers.
Service Oriented Architecture (SOA) is a paradigm for organising and using distributed services that may be under the control of different ownership domains and implemented using various technology stacks. In some contexts, an organisation using an IT infrastructure implementing the SOA paradigm can take a great benefit from the integration, in its business processes, of efficient machine translation (MT) services to overcome language barriers. This paper describes the architecture and the design patterns used to develop an MT service that is efficient, scalable and easy to integrate in new and existing business processes. The service is based on Apertium, a free/opensource rule-based machine translation platform.
This paper describes the implementation of a second-order hidden Markov model (HMM) based part-of-speech tagger for the Apertium free/opensource rule-based machine translation platform. We describe the part-ofspeech (PoS) tagging approach in Apertium and how it is parametrised through a tagger definition file that defines: (1) the set of tags to be used and (2) constrain rules that can be used to forbid certain PoS tag sequences, thus refining the HMM parameters and increasing its tagging accuracy. The paper also reviews the Baum-Welch algorithm used to estimate the HMM parameters and compares the tagging accuracy achieved with that achieved by the original, first-order HMM-based PoS tagger in Apertium.
This article describes the needs of UOC regarding translation and how these needs are satisfied by Prompsit further developing a free rule-based machine translation system: Apertium. We initially describe the general framework regarding linguistic needs inside UOC. Then, section 2 introduces Apertium and outlines the development scenario that Prompsit executed. After that, section 3 outlines the specific needs of UOC and why Apertium was chosen as the machine translation engine. Then, section 4 describes some of the features specially developed in this project. Section 5 explains how the linguistic data was improved to increase the quality of the output in Catalan and Spanish. And, finally, we draw conclusions and outline further work originating from the project.