Workshop on open-source machine translation

Anthology ID:
September 13-15
Phuket, Thailand
Bib Export formats:

pdf bib
The Open A.I. Kit: General Machine Learning Modules from Statistical Machine Translation
Daniel J. Walker

The Open A.I. Kit implements the major components of Statistical Machine Translation as an accessible, extendable Software Development Kit with broad applicability beyond the field of Machine Translation. The high-level system design policies of the kit embrace the Open Source development model to provide a modular architecture and interface, which may serve as a basis for collaborative research and development for endeavors in Artificial Intelligence.

pdf bib
An Open Architecture for Transfer-based Machine Translation between Spanish and Basque
Iñaki Alegria | Arantza Diaz de Ilarraza | Gorka Labaka | Mikel Lersundi | Aingeru Mayor | Kepa Sarasola | Mikel L. Forcada | Sergio Ortiz-Rojas | Lluís Padró

We present the current status of development of an open architecture for the translation from Spanish into Basque. The machine translation architecture uses an open source analyser for Spanish and new modules mainly based on finite-state transducers. The project is integrated in the OpenTrad initiative, a larger government funded project shared among different universities and small companies, which will also include MT engines for translation among the main languages in Spain. The main objective is the construction of an open, reusable and interoperable framework. This paper describes the design of the engine, the formats it uses for the communication among the modules, the modules reused from other project named Matxin and the new modules we are building.

pdf bib
Open Source Machine Translation with DELPH-IN
Francis Bond | Stephan Oepen | Melanie Siegel | Ann Copestake | Dan Flickinger

pdf bib
An Open-Source Shallow-Transfer Machine Translation Toolbox: Consequences of Its Release and Availability
Carme Armentano-Oller | Antonio M. Corbí-Bellot | Mikel L. Forcada | Mireia Ginestí-Rosell | Boyan Bonev | Sergio Ortiz-Rojas | Juan Antonio Pérez-Ortiz | Gema Ramírez-Sánchez | Felipe Sánchez-Martínez

By the time Machine Translation Summit X is held in September 2005, our group will have released an open-source machine translation toolbox as part of a large government-funded project involving four universities and three linguistic technology companies from Spain. The machine translation toolbox, which will most likely be released under a GPL-like license includes (a) the open-source engine itself, a modular shallow-transfer machine translation engine suitable for related languages and largely based upon that of systems we have already developed, such as interNOSTRUM for Spanish—Catalan and Traductor Universia for Spanish—Portuguese, (b) extensive documentation (including document type declarations) specifying the XML format of all linguistic (dictionaries, rules) and document format management files, (c) compilers converting these data into the high-speed (tens of thousands of words a second) format used by the engine, and (d) pilot linguistic data for Spanish—Catalan and Spanish—Galician and format management specifications for the HTML, RTF and plain text formats. After describing very briefly this toolbox, this paper aims at exploring possible consequences of the availability of this architecture, including the community-driven development of machine translation systems for languages lacking this kind of linguistic technology.