Statistical approach for Korean analysis

Nari Kim


Abstract
In conventional approaches to Korean analysis, verb subcategorization has generally been used as lexical knowledge. A problem arises, however, when we are given long sentences in which two or more verbs of the same subcategorization are involved. In those sentences, a noun phrase may be taken as the constituent of more than one verb and cause an ambiguity. This paper presents an approach to solving this problem by using structural patterns acquired by a statistical method from corpora. Structural patterns can be the processing units for syntactic analysis and for translation into other languages as well. We have collected 10,686 unique structural patterns from a Korean corpus of 1.27 million words. We have analyzed 2,672 sentences and shown that structural patterns can improve the accuracy of Korean analysis.
Anthology ID:
1998.amta-papers.27
Volume:
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers
Month:
October 28-31
Year:
1998
Address:
Langhorne, PA, USA
Venue:
AMTA
SIG:
Publisher:
Springer
Note:
Pages:
308–317
Language:
URL:
https://link.springer.com/chapter/10.1007/3-540-49478-2_28
DOI:
Bibkey:
Copy Citation:
PDF:
https://link.springer.com/chapter/10.1007/3-540-49478-2_28