Some authors (Simard et al.; Melamed; Danielsson & Mühlenbock) have suggested measures of similarity of words in different languages so as to find extra clues for alignment of parallel texts. Cognate words, like ‘Parliament’ and ‘Parlement’, in English and French respectively, provide extra anchors that help to improve the quality of the alignment. In this paper, we will extend an alignment algorithm proposed by Ribeiro et al. using typical contiguous and non-contiguous sequences of characters extracted using a statistically sound method (Dias et al.). With these typical sequences, we are able to find more reliable correspondence points and improve the alignment quality without recurring to heuristics to identify cognates.
This paper reports the first results of an on-going research on evaluation of Machine Translation quality. The starting point for this work was the framework of ISLE (the International Standards for Language Engineering), which provides a classification for evaluation of Machine Translation. In order to make a quantitative evaluation of translation quality, we pursue a more consistent, fine-grained and comprehensive classification of possible translation errors and we propose metrics for sentence level errors, specifically lexical and syntactic errors.
A self-learning method of parallel texts alignment
António Ribeiro | Gabriel Lopes | João Mexia
Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: Technical Papers
This paper describes a language independent method for alignment of parallel texts that re-uses acquired knowledge. The system extracts word translation equivalents and re-uses them as correspondence points in order to enhance the alignment of parallel texts. Points that may cause misalignment are filtered using confidence bands of linear regression analysis instead of heuristics, which are not theoretically reliable. Homographs bootstrap the alignment process so as to build the primary word translation lexicon. At each step, the previously acquired lexicon is re-used so as to repeatedly make finer-grained alignments and produce more reliable translation lexicons.