ISMTCL
International Symposium
on Data and Sense
Mining, Machine
Translation and Controlled
Languages, and their application
to
emergencies and safety critical
domains
July 1-3, 2009
Centre Tesnière
University of
Franche-Comté
Besançon, France
Presses universitaires
de Franche-Comté, 2009
[ISBN 978-2-84867-261-8]
Selected abstracts
pp.13-20:
Multiple Uses of Machine
Translation and Computerised Translation Tools
John Hutchins
web:
http://www.hutchinsweb.me.uk
Abstract
For many years MT systems
and tools were used principally for the production of good-quality translations:
either MT in combination with controlled (restricted) input and/or with human
post-editing; or computer-based translation tools by translators. Since 1990
the situation has changed. Corporate use of MT with human assistance has
continued to expand (particularly in the area of localisation) and the use of translation
aids has increased (particularly with the coming of translation memories). But
the main change has been the ever expanding use of unrevised MT output, such as
online translation services (Babel Fish, Google, etc.), applications in
information extraction, document retrieval, intelligence analysis, electronic mail,
and much more.
---
pp.38-42:
French to Arabic Machine
Translation
Isomorphic Syntax, Use
of Terminal Sequences
Mohand Beddar
Centre Lucien
Tesnière, Université de Franche-Comté, France
mohand.beddar@edu.univ-fcomte.fr
Abstract
Languages are different
and each of them requires special processing. A machine translation system should
take into account syntactic and semantic particularities of each language. In
fact, syntactic and semantic models should be defined in both the source and
target languages and a link should be established between both the syntactic
and semantic models. In security protocols, French and Arabic syntax are quite similar.
Structures can be formalized to be isomorphic structures and terminal sequences
link these structures to the semantics of the language, therefore limiting semantic
ambiguities.
---
pp.43-48:
Remarks about Linguistic
Analysis, Normalization and Translation of
Spanish "What to Do in Case of Fire" Texts
Xavier Blanco
Universitat Autònoma de Barcelona
Xavier.Blanco@ uab.cat
Abstract
We intend to discuss, in a understandable way for no trained linguists, the key
points that can allow us to see the main invariable meaning of a cluster of
texts through the enormous multiplicity of possible paraphrases. As means of
example, we show how to formalize the core messages of a collection of Spanish
“What to Do in Case of Fire” texts. We keep our analysis to the simple phrase
level (i.e. no textual constraints as, for instance, cohesion or thematic progression,
will be discussed). At this level, we discuss mainly the following topics:
semantic labelling of facts; semantic labelling of entities; paradigmatic lexical
functions; syntagmatic lexical functions; grammatical
meanings. We give examples in different European languages concerning the
translation of our “What to Do in Case of Fire” texts. We show that linguistic
formalization can be regarded as sort of an interlingua
particularly well suited for translation of alert texts.
pp.49-55:
Controlled Languages and
Machine Translation
Krzysztof Bogacki
kbogacki@gmail.com
Abstract
We will examine
controlled languages in the context of machine translation. First we will
present principles governing the conversion of standard natural language texts
into controlled Polish. We will compare 6 converted texts with their standard
sources and comment on them. Then we will present the results of an experiment
which has turned out to be disappointing in this respect: the number of mistranslations
and various sorts of mistakes was sufficiently big to make us query the reasons
of such a result.
---
pp.69-73:
Achieving a Better
Machine Translation from French to English
via a Controlled Language
Tessa Cornally
Centre Tesnière, Université de Franche-Comté,
France
tessa.carnally@gmail.com
Abstract
This article discusses
how the use of a controlled language can improve Machine Translation results. More
specifically it is concerned with the sublanguage of oenology and examines the
structures specific to this domain.
---
pp.82-89:
English/Veneto Resource
Poor Machine Translation with STILVEN
Rodolfo Delmonte, Antonella Bristot, Sara Tonelli, Emanuele Pianta
Università Ca' Foscari -
Department of Language Sciences
E-mail:delmont@unive.it
Abstract
The paper reports ongoing
work for the implementation of a system for automatic translation from
English-to-Veneto and viceversa. The system does not
have parallel texts to work on because of the almost inexistence of such manual
translations. The project is called STILVEN and is financed by the Regional
Authorities of Veneto Region in
---
pp.100-103:
Syntactic Problems in
French-Russian Machine Translation of Periodicals
Ekaterina Ershova
Centre Tesnière,
University of Franche-Comté,
ekaterinarshv@gmail.com
Abstract
French-Russian machine
translation is not a new-found field of study. Nevertheless this is a domain
that is very rich and fruitful in problems which still need to be resolved,
particularly syntactic problems. Moreover, limited to a certain subject field,
i.e. periodicals, MT “of high quality” is certain to be possible.
---
pp.104-113:
Translating Composite
Sentences in Azerbaijani-English MT System
Rauf Fatullayev Sevinc Mammadova Abulfat Fatullayev
National E-Governance
National E-Governance
Institute of
Cybernetics
Network Initiative Project
Network Initiative
Project
Baku, Azerbaijan Baku,
fatullavev@gmail.com sevinc@dilmanc.az
Abstract
This article is dedicated
to the automation of the translation process of the composite sentences in the Azerbaijani
language in an Azerbaijani-English MT system. First the Azerbaijani composite
sentence is divided into simple sentences and English translation of the
sentence is synthesized by using the translations of the simple sentences.
---
pp.138-142:
Some Problems in a
French-Chinese Machine Translation System
Gan Jin
Centre Tesnière, Faculté des Lettres
Universiée de Franche-Comté
Besançon,
France
jingan1982@hotmail.com
Abstract
The Chinese language is
very different from European languages with respect to morphology, lexica,
syntax and semantics. This complexity causes many problems in machine
translation systems. ‘Rang' is a Chinese character morphologically simple, but
its usage is very complicated. For a long time, linguists have been interested
in its complexity. With all the efforts, the linguists have not come to agreement
to this day as to the correct grammatical category. Some linguists consider it
as a preposition, others as a verb. In this article, we try to explain the real
usage of ‘Rang' sentence in a French-Chinese machine translation
system for a specific domain where safety is extremely important and we show
our methods of disambiguating verbs with respect to not only Chinese grammar
but also French grammar because the latter is our source language in the translation
system. Our objective is to create a reliable machine translation system.
---
pp.158-163:
French-Vietnamese Noun
phrase Translation
Le Thi Sinh
Centre Tesnière, Université de Franche-Comté, France
Stephanie_cuhb@yahoo.com
Abstract
Vietnamese is a noun
classifier language, while this is not the case for French. This is why we
encounter some problems when translating French noun phrases (NPs) into
Vietnamese. This article will suggest a simple algorithm for French-Vietnamese
NP translation after building a Vietnamese classifier-noun combination system
which makes up a considerable knot to be solved in the algorithm. All these are
realized basing on the results from the condensed comparative analyses of NPs
of the two languages under consideration.
---
pp.172-178:
Cognitive Models of
Yesterday and Today in Machine Translation and their
Implication for Controlled Languages
Henri Madec
Tesnière research Center
in linguistics
University of
henri.madec@univ-fcomte.fr
Abstract
In this article we
present a contradiction between the 50 years NLP epistemology based on
behaviourism and the today's one that requires taking into account advances in
brain imaging. The science of translation is certainly an idea developed after
the Soviet experience of Pavlov on a stimulus-response model. Today we know
better how the human brain works. It would be necessary to make a new science
of translation, based on other principles and a new epistemology. This should
be a necessity. And perhaps the probabilistic and statistical models in fashion
now with Google lead us in this direction. But it is doubtful that the two last
approaches go in the same direction. Also
would it be better to defend the Soviet science of translation which doesn't
prove quite outdated in the current state of the art and allow the controlled languages....
even if it does not contribute to build a stable and consistent pattern of TAL.
---
pp.185-189
Abduction Alerts in
Greek and Spanish
Eleni Papadopoulou Marcel Puig Portella
Universitat Autònoma de Barcelona Universitat Autònoma de Barcelona
Eleni.Papadopoulou@uab.cat Marcel.Puig@uab.cat
Abstract
In this paper, we intend
to present a redaction model of abduction alert messages for both Greek and
Spanish languages, in the frame of the MESSAGE project. The main object of the
present work is to describe the lexicographic groundwork needed in order to
construct models applicable to NLP (Natural Language Processing) software,
which would be able to generate and translate automatically texts in Greek and
Spanish and to be further used in the controlled languages' field.
---
pp.190-197:
Grammatical and Lexical
Errors Analysis of English-Vietnamese Translation
Texts with the Google & EVTRAN Engines and Post-editing Tasks
Phan Thi
Thanh Thao,
T.Phan.Thi.Thanh@wlv.ac.uk;
thaohuy7269@yahoo.com
Abstract
Evaluation of machine
translation (MT) is a challenging task for computational linguists due to the variety
of translation engines used to “decode” different pairs of languages. Although
a large number of automatic evaluation measures have been proposed and studied
over the last years, human judgements of MT quality still remain the best
method for comparing and evaluating different MT systems. Moreover, to achieve
the high quality of MT, manual post-editing tasks are taken into significant
consideration. It is important to study the methods of post-editing the MT output
fast and effectively, which requires the typical error analysis of different
translation engines. This paper mentions the grammatical and lexical error analysis
of English-Vietnamese translation texts extracted from BBC News from January to
May 2009 with the Google and EVTRAN engines. Based on the error analysis, some
manual post-editing tasks are suggested to improve the quality of translation
engines to some extent.
---
pp.198-202:
Treatment of the
Imperative Forms in the Machine Translation between
Catalan, Spanish and
Greek
Marcel Puig Portella Eleni Papadopoulou
Universitat Autònoma de Barcelona Universitat Autònoma de Barcelona
Marcel.Puig@uab.cat Eleni.Papadopoulou@uab.cat
Abstract
This paper is a fruit of
our research on controlled languages and their automatic/machine translation in
the frame of Alert Messages and Protocols project and its purpose is twofold. Firstly,
we outline the imperative and alter-imperative forms in Catalan, Greek and
Spanish. Following to this description, we propose a systematic treatment of
these forms in the framework of the automatic translation and the controlled
languages.
---
pp.231-235:
Polish controlled
language and its machine translation into French
Zuzanna Rudas
Centre Tesnière, Université de Franche-Comté, Besançon
z.rudas@gmail.com
Abstract
This paper presents some difficulties
encountered during the analysis of Polish controlled protocols about the fire
regulations in their aim for a machine translation into French. The applied
method to solve the problems uses the systemic linguistics.
---
pp.249-255:
Post-editing Experiments
with MT for a Controlled Language
Irina Temnikova and Constantin Orasan
Research Institute in
Information and Language Processing
E-mail: I.Temnikova2@wlv.ac.uk,
C.0rasan@wlv.ac.uk
Abstract
This paper aims to
establish whether a new controlled language (CL) for emergency-related texts could
facilitate both human and machine translation. To achieve this, an experiment
involving an MT engine and human translators and post-editors was conduced. In
order to estimate whether the CL pre-editing has an impact on human and machine
translation, the time to manually translate, the time to post-edit and the edit
distance between the original and the simplified texts were measured. The
results of the experiment confirm the hypothesis.
---
pp.256-259:
Research into the
Practicality of Machine Translation of Administrative
Documents
Tsai Yi-Jung
Centre Tesnière
30, Rue de Mégevand,
25000 Besançon,
France
charlotte0421@gmail.com
Abstract
This paper discusses
Machine Translation (MT) in the domain of administrative documents. MT is not only
an interesting field for research but also has an immense practical value to
society. Due to the fact that natural languages contain many ambiguities and
there is too much information for the corpus to be complete, there are still
some problems to resolve and some obstacles to overcome. However,
administrative documents issued by governments and other official institutions,
which are written in a standard form, with an identical structure, limited
vocabulary and rarely contain ambiguities are just an ideal sublanguage domain
to treat in which fully automatic high quality translation is achievable.
---
pp.260-268:
Building a Linguistic
Database for Chinese Interrogative Sentences
Xiaohong Wu
Centre Tesnière, Faculté des Lettres
Université de Franche-Comté, France
Faculté de
Langues Etrangères
Minzu Université de Qinghai, Chine
wuxiaohongfr@yahoo.com.cn
Abstract
Analysis of the
interrogative sentences plays an important role in systems that focus on work
such as question-answering and/or human-machine communication. In this paper we
present the work we have done for a multilingual MT system. Texts collected for
the building of the parallel corpora are those from the domains where the
accurate interpretation of the texts is extremely crucial. Therefore, exact and
accurate translation is not only necessary but also obligatory. To reach high
quality translations, we adopt the controlled language technique. Here we lay
focus on building a linguistic database for the analysis and transfer of the
French interrogative sentences into Chinese, which play an important role in
some texts in our corpora. We will introduce how we classify and control the interrogative
sentences in our work. We will describe the classifications and the linguistic
information needed when processing the interrogative sentences according to
their differed usages.