Yu-Lun Hsieh

2022

Multifaceted Assessments of Traditional Chinese Word Segmentation Tool on Large Corpora
Wen-Chao Yeh | Yu-Lun Hsieh | Yung-Chun Chang | Wen-Lian Hsu
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

This study aims to evaluate three most popular word segmentation tool for a large Traditional Chinese corpus in terms of their efficiency, resource consumption, and cost. Specifically, we compare the performances of Jieba, CKIP, and MONPA on word segmentation, part-of-speech tagging and named entity recognition through extensive experiments. Experimental results show that MONPA using GPU for batch segmentation can greatly reduce the processing time of massive datasets. In addition, its features such as word segmentation, part-of-speech tagging, and named entity recognition are beneficial to downstream applications.

2019

pdf bib abs

This work examines the robustness of self-attentive neural networks against adversarial input perturbations. Specifically, we investigate the attention and feature extraction mechanisms of state-of-the-art recurrent neural networks and self-attentive architectures for sentiment analysis, entailment and machine translation under adversarial attacks. We also propose a novel attack algorithm for generating more natural adversarial examples that could mislead neural models but not humans. Experimental results show that, compared to recurrent neural models, self-attentive models are more robust against adversarial perturbation. In addition, we provide theoretical explanations for their superior robustness to support our claims.

pdf bib

MONPA:中文命名實體及斷詞與詞性同步標註系統(MONPA: A Multitask Chinese Segmentation, Named-entity and Part-of-speech Annotator)
Wen-Chao Yeh | Yu-Lun Hsieh | Yung-Chun Chang | Wen-Lian Hsu
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)

2017

pdf bib abs

CIAL at IJCNLP-2017 Task 2: An Ensemble Valence-Arousal Analysis System for Chinese Words and Phrases
Zheng-Wen Lin | Yung-Chun Chang | Chen-Ann Wang | Yu-Lun Hsieh | Wen-Lian Hsu
Proceedings of the IJCNLP 2017, Shared Tasks

Sentiment lexicon is very helpful in dimensional sentiment applications. Because of countless Chinese words, developing a method to predict unseen Chinese words is required. The proposed method can handle both words and phrases by using an ADVWeight List for word prediction, which in turn improves our performance at phrase level. The evaluation results demonstrate that our system is effective in dimensional sentiment analysis for Chinese phrases. The Mean Absolute Error (MAE) and Pearson’s Correlation Coefficient (PCC) for Valence are 0.723 and 0.835, respectively, and those for Arousal are 0.914 and 0.756, respectively.

pdf bib abs

MONPA: Multi-objective Named-entity and Part-of-speech Annotator for Chinese using Recurrent Neural Network
Yu-Lun Hsieh | Yung-Chun Chang | Yi-Jie Huang | Shu-Hao Yeh | Chun-Hung Chen | Wen-Lian Hsu
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Part-of-speech (POS) tagging and named entity recognition (NER) are crucial steps in natural language processing. In addition, the difficulty of word segmentation places additional burden on those who intend to deal with languages such as Chinese, and pipelined systems often suffer from error propagation. This work proposes an end-to-end model using character-based recurrent neural network (RNN) to jointly accomplish segmentation, POS tagging and NER of a Chinese sentence. Experiments on previous word segmentation and NER datasets show that a single model with the proposed architecture is comparable to those trained specifically for each task, and outperforms freely-available softwares. Moreover, we provide a web-based interface for the public to easily access this resource.

pdf bib abs

Identifying Protein-protein Interactions in Biomedical Literature using Recurrent Neural Networks with Long Short-Term Memory
Yu-Lun Hsieh | Yung-Chun Chang | Nai-Wen Chang | Wen-Lian Hsu
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

In this paper, we propose a recurrent neural network model for identifying protein-protein interactions in biomedical literature. Experiments on two largest public benchmark datasets, AIMed and BioInfer, demonstrate that our approach significantly surpasses state-of-the-art methods with relative improvements of 10% and 18%, respectively. Cross-corpus evaluation also demonstrate that the proposed model remains robust despite using different training data. These results suggest that RNN can effectively capture semantic relationships among proteins as well as generalizes over different corpora, without any feature engineering.