Oanh Tran


2022

pdf bib
Using Convolution Neural Network with BERT for Stance Detection in Vietnamese
Oanh Tran | Anh Cong Phung | Bach Xuan Ngo
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Stance detection is the task of automatically eliciting stance information towards a specific claim made by a primary author. While most studies have been done for high-resource languages, this work is dedicated to a low-resource language, namely Vietnamese. In this paper, we propose an architecture using transformers to detect stances in Vietnamese claims. This architecture exploits BERT to extract contextual word embeddings instead of using traditional word2vec models. Then, these embeddings are fed into CNN networks to extract local features to train the stance detection model. We performed extensive comparison experiments to show the effectiveness of the proposed method on a public dataset1 Experimental results show that this proposed model outperforms the previous methods by a large margin. It yielded an accuracy score of 75.57% averaged on four labels. This sets a new SOTA result for future research on this interesting problem in Vietnamese.

2020

pdf bib
Introducing a Large-Scale Dataset for Vietnamese POS Tagging on Conversational Texts
Oanh Tran | Tu Pham | Vu Dang | Bang Nguyen
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper introduces a large-scale human-labeled dataset for the Vietnamese POS tagging task on conversational texts. To this end, wepropose a new tagging scheme (with 36 POS tags) consisting of exclusive tags for special phenomena of conversational words, developthe annotation guideline and manually annotate 16.310K sentences using this guideline. Based on this corpus, a series of state-of-the-art tagging methods has been conducted to estimate their performances. Experimental results showed that the Conditional Random Fields model using both automatically learnt features from deep neural networks and handcrafted features yielded the best performance. Thismodel achieved 93.36% in the accuracy score which is 1.6% and 2.7% higher than the model using either handcrafted features orautomatically-learnt features, respectively. This result is also a little bit higher than the model of fine-tuning BERT by 0.94% in theaccuracy score. The performance measured on each POS tag is also very high with >90% in the F1 score for 20 POS tags and >80%in the F1 score for 11 POS tags. This work provides the public dataset and preliminary results for follow-up research on this interesting direction.