Thai-Hoang Pham


pdf bib
An Empirical Study on Fine-Grained Named Entity Recognition
Khai Mai | Thai-Hoang Pham | Minh Trung Nguyen | Tuan Duc Nguyen | Danushka Bollegala | Ryohei Sasano | Satoshi Sekine
Proceedings of the 27th International Conference on Computational Linguistics

Named entity recognition (NER) has attracted a substantial amount of research. Recently, several neural network-based models have been proposed and achieved high performance. However, there is little research on fine-grained NER (FG-NER), in which hundreds of named entity categories must be recognized, especially for non-English languages. It is still an open question whether there is a model that is robust across various settings or the proper model varies depending on the language, the number of named entity categories, and the size of training datasets. This paper first presents an empirical comparison of FG-NER models for English and Japanese and demonstrates that LSTM+CNN+CRF (Ma and Hovy, 2016), one of the state-of-the-art methods for English NER, also works well for English FG-NER but does not work well for Japanese, a language that has a large number of character types. To tackle this problem, we propose a method to improve the neural network-based Japanese FG-NER performance by removing the CNN layer and utilizing dictionary and category embeddings. Experiment results show that the proposed method improves Japanese FG-NER F-score from 66.76% to 75.18%.

pdf bib
Building a Spoonerism Detection System for Vietnamese
Thai-Hoang Pham | Xuan-Khoai Pham
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation


pdf bib
NNVLP: A Neural Network-Based Vietnamese Language Processing Toolkit
Thai-Hoang Pham | Xuan-Khoai Pham | Tuan-Anh Nguyen | Phuong Le-Hong
Proceedings of the IJCNLP 2017, System Demonstrations

This paper demonstrates neural network-based toolkit namely NNVLP for essential Vietnamese language processing tasks including part-of-speech (POS) tagging, chunking, Named Entity Recognition (NER). Our toolkit is a combination of bidirectional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network (CNN), Conditional Random Field (CRF), using pre-trained word embeddings as input, which outperforms previously published toolkits on these three tasks. We provide both of API and web demo for this toolkit.

pdf bib
The Importance of Automatic Syntactic Features in Vietnamese Named Entity Recognition
Thai-Hoang Pham | Phuong Le-Hong
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

pdf bib
Extended Named Entity Recognition API and Its Applications in Language Education
Tuan Duc Nguyen | Khai Mai | Thai-Hoang Pham | Minh Trung Nguyen | Truc-Vien T. Nguyen | Takashi Eguchi | Ryohei Sasano | Satoshi Sekine
Proceedings of ACL 2017, System Demonstrations