Gábor Pohl
2006
Exploiting Parallel Corpora for Supervised Word-Sense Disambiguation in English-Hungarian Machine Translation
Márton Miháltz
|
Gábor Pohl
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
In this paper we present an experiment to automatically generate annotated training corpora for a supervised word sense disambiguation module operating in an English-Hungarian and a Hungarian-English machine translation system. Training examples for the WSD module of the MT system are produced by annotating ambiguous lexical items in the source language (words having several possible translations) with their proper target language translations. Since manually annotating training examples is very costly, we are experimenting with a method to produce examples automatically from parallel corpora. Our algorithm relies on monolingual and bilingual lexicons and dictionaries in addition to statistical methods in order to annotate examples extracted from a large English-Hungarian parallel corpus accurately aligned at sentence level. In the paper, we present an experiment with the English noun state, where we categorized the different occurrences in the Hunglish parallel corpus. For this noun, most of the examples were covered by multiword lexical items originating from our lexical sources.
English-Hungarian NP Alignment in MetaMorpho™
Gábor Pohl
Proceedings of the 11th Annual Conference of the European Association for Machine Translation
2004
A New Approach to the Corpus-based Statistical Investigation of Hungarian Multi-word Lexemes
Balázs Kis
|
Begoña Villada
|
Gosse Bouma
|
Gábor Ugray
|
Tamás Bíró
|
Gábor Pohl
|
John Nerbonne
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
Search
Co-authors
- Márton Miháltz 1
- Balázs Kis 1
- Begoña Villada Moirón 1
- Gosse Bouma 1
- Gábor Ugray 1
- show all...