Selection of Entries for a Bilingual Dictionary from Aligned Translation Equivalents using Support Vector Machines

Takeshi Kutsumi, Takehiko Yoshimi, Katsunori Kotani, Ichiko Sata, Hitoshi Isahara


Abstract
This paper claims that constructing a dictionary using bilingual pairs obtained from parallel corpora needs not only correct alignment of two noun phrases but also judgment of its appropriateness as an entry. It specifically addresses the latter task, which has been paid little attention. It demonstrates a method of selecting a suitable entry using Support Vector Machines, and proposes to regard as the features the common and the different parts between a current translation and a new translation. Using experiment results, this paper examines how selection performances are affected by the four ways of representing the common and the different parts: morphemes, parts of speech, semantic markers, and upper-level semantic markers. Moreover, we used n-grams of the common and the different parts of above four kinds of features. Experimental result found that representation by morphemes marked the best performance, F-measure of 0.803.
Anthology ID:
2005.mtsummit-papers.2
Volume:
Proceedings of Machine Translation Summit X: Papers
Month:
September 13-15
Year:
2005
Address:
Phuket, Thailand
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
11–16
Language:
URL:
https://aclanthology.org/2005.mtsummit-papers.2
DOI:
Bibkey:
Cite (ACL):
Takeshi Kutsumi, Takehiko Yoshimi, Katsunori Kotani, Ichiko Sata, and Hitoshi Isahara. 2005. Selection of Entries for a Bilingual Dictionary from Aligned Translation Equivalents using Support Vector Machines. In Proceedings of Machine Translation Summit X: Papers, pages 11–16, Phuket, Thailand.
Cite (Informal):
Selection of Entries for a Bilingual Dictionary from Aligned Translation Equivalents using Support Vector Machines (Kutsumi et al., MTSummit 2005)
Copy Citation:
PDF:
https://aclanthology.org/2005.mtsummit-papers.2.pdf