An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts

Phuong Le-Hong, Azim Roussanaly, Thi Minh Huyen Nguyen, Mathias Rossignol


Abstract
This paper presents an empirical study on the application of the maximum entropy approach for part-of-speech tagging of Vietnamese text, a language with special characteristics which largely distinguish it from occidental languages. Our best tagger explores and includes useful knowledge sources for tagging Vietnamese text and gives a 93.40%overall accuracy and a 80.69%unknown word accuracy on a test set of the Vietnamese treebank. Our tagger significantly outperforms the tagger that is being used for building the Vietnamese treebank, and as far as we are aware, this is the best tagging result ever published for the Vietnamese language.
Anthology ID:
2010.jeptalnrecital-long.36
Volume:
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
Month:
July
Year:
2010
Address:
Montréal, Canada
Editors:
Philippe Langlais, Michel Gagnon
Venue:
JEP/TALN/RECITAL
SIG:
Publisher:
ATALA
Note:
Pages:
351–362
Language:
URL:
https://aclanthology.org/2010.jeptalnrecital-long.36
DOI:
Bibkey:
Cite (ACL):
Phuong Le-Hong, Azim Roussanaly, Thi Minh Huyen Nguyen, and Mathias Rossignol. 2010. An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts. In Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, pages 351–362, Montréal, Canada. ATALA.
Cite (Informal):
An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts (Le-Hong et al., JEP/TALN/RECITAL 2010)
Copy Citation:
PDF:
https://aclanthology.org/2010.jeptalnrecital-long.36.pdf