An Empirical Investigation of Error Types in Vietnamese Parsing
Quy Nguyen | Yusuke Miyao | Hiroshi Noji | Nhung Nguyen
Proceedings of the 27th International Conference on Computational Linguistics

Syntactic parsing plays a crucial role in improving the quality of natural language processing tasks. Although there have been several research projects on syntactic parsing in Vietnamese, the parsing quality has been far inferior than those reported in major languages, such as English and Chinese. In this work, we evaluated representative constituency parsing models on a Vietnamese Treebank to look for the most suitable parsing method for Vietnamese. We then combined the advantages of automatic and manual analysis to investigate errors produced by the experimented parsers and find the reasons for them. Our analysis focused on three possible sources of parsing errors, namely limited training data, part-of-speech (POS) tagging errors, and ambiguous constructions. As a result, we found that the last two sources, which frequently appear in Vietnamese text, significantly attributed to the poor performance of Vietnamese parsing.


Challenges and Solutions for Consistent Annotation of Vietnamese Treebank
Quy Nguyen | Yusuke Miyao | Ha Le | Ngan Nguyen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Treebanks are important resources for researchers in natural language processing, speech recognition, theoretical linguistics, etc. To strengthen the automatic processing of the Vietnamese language, a Vietnamese treebank has been built. However, the quality of this treebank is not satisfactory and is a possible source for the low performance of Vietnamese language processing. We have been building a new treebank for Vietnamese with about 40,000 sentences annotated with three layers: word segmentation, part-of-speech tagging, and bracketing. In this paper, we describe several challenges of Vietnamese language and how we solve them in developing annotation guidelines. We also present our methods to improve the quality of the annotation guidelines and ensure annotation accuracy and consistency. Experiment results show that inter-annotator agreement ratios and accuracy are higher than 90% which is satisfactory.


Utilizing State-of-the-art Parsers to Diagnose Problems in Treebank Annotation for a Less Resourced Language
Quy Nguyen | Ngan Nguyen | Yusuke Miyao
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse


Comparing Different Criteria for Vietnamese Word Segmentation
Quy T. Nguyen | Ngan L.T. Nguyen | Yusuke Miyao
Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing