Thai-Hoang Pham

2018

Named entity recognition (NER) has attracted a substantial amount of research. Recently, several neural network-based models have been proposed and achieved high performance. However, there is little research on fine-grained NER (FG-NER), in which hundreds of named entity categories must be recognized, especially for non-English languages. It is still an open question whether there is a model that is robust across various settings or the proper model varies depending on the language, the number of named entity categories, and the size of training datasets. This paper first presents an empirical comparison of FG-NER models for English and Japanese and demonstrates that LSTM+CNN+CRF (Ma and Hovy, 2016), one of the state-of-the-art methods for English NER, also works well for English FG-NER but does not work well for Japanese, a language that has a large number of character types. To tackle this problem, we propose a method to improve the neural network-based Japanese FG-NER performance by removing the CNN layer and utilizing dictionary and category embeddings. Experiment results show that the proposed method improves Japanese FG-NER F-score from 66.76% to 75.18%.

pdf bib

Building a Spoonerism Detection System for Vietnamese
Thai-Hoang Pham | Xuan-Khoai Pham
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

2017

pdf bib

pdf bib

The Importance of Automatic Syntactic Features in Vietnamese Named Entity Recognition
Thai-Hoang Pham | Phuong Le-Hong
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

pdf bib abs

NNVLP: A Neural Network-Based Vietnamese Language Processing Toolkit
Thai-Hoang Pham | Xuan-Khoai Pham | Tuan-Anh Nguyen | Phuong Le-Hong
Proceedings of the IJCNLP 2017, System Demonstrations

This paper demonstrates neural network-based toolkit namely NNVLP for essential Vietnamese language processing tasks including part-of-speech (POS) tagging, chunking, Named Entity Recognition (NER). Our toolkit is a combination of bidirectional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network (CNN), Conditional Random Field (CRF), using pre-trained word embeddings as input, which outperforms previously published toolkits on these three tasks. We provide both of API and web demo for this toolkit.

Co-authors

Truc-Vien T. Nguyen 1

Tuan-Anh Nguyen 1

Venues

Fix author