Lung-Hao Lee


2021

pdf bib
Classification of Tweets Self-reporting Adverse Pregnancy Outcomes and Potential COVID-19 Cases Using RoBERTa Transformers
Lung-Hao Lee | Man-Chen Hung | Chien-Huan Lu | Chang-Hao Chen | Po-Lei Lee | Kuo-Kai Shyu
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

This study describes our proposed model design for SMM4H 2021 shared tasks. We fine-tune the language model of RoBERTa transformers and their connecting classifier to complete the classification tasks of tweets for adverse pregnancy outcomes (Task 4) and potential COVID-19 cases (Task 5). The evaluation metric is F1-score of the positive class for both tasks. For Task 4, our best score of 0.93 exceeded the mean score of 0.925. For Task 5, our best of 0.75 exceeded the mean score of 0.745.

pdf bib
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)
Lung-Hao Lee | Chia-Hui Chang | Kuan-Yu Chen
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

pdf bib
Multi-Label Classification of Chinese Humor Texts Using Hypergraph Attention Networks
Hao-Chuan Kao | Man-Chen Hung | Lung-Hao Lee | Yuen-Hsien Tseng
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

We use Hypergraph Attention Networks (HyperGAT) to recognize multiple labels of Chinese humor texts. We firstly represent a joke as a hypergraph. The sequential hyperedge and semantic hyperedge structures are used to construct hyperedges. Then, attention mechanisms are adopted to aggregate context information embedded in nodes and hyperedges. Finally, we use trained HyperGAT to complete the multi-label classification task. Experimental results on the Chinese humor multi-label dataset showed that HyperGAT model outperforms previous sequence-based (CNN, BiLSTM, FastText) and graph-based (Graph-CNN, TextGCN, Text Level GNN) deep learning models.

pdf bib
Incorporating Domain Knowledge into Language Transformers for Multi-Label Classification of Chinese Medical Questions
Po-Han Chen | Yu-Xiang Zeng | Lung-Hao Lee
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

In this paper, we propose a knowledge infusion mechanism to incorporate domain knowledge into language transformers. Weakly supervised data is regarded as the main source for knowledge acquisition. We pre-train the language models to capture masked knowledge of focuses and aspects and then fine-tune them to obtain better performance on the downstream tasks. Due to the lack of publicly available datasets for multi-label classification of Chinese medical questions, we crawled questions from medical question/answer forums and manually annotated them using eight predefined classes: persons and organizations, symptom, cause, examination, disease, information, ingredient, and treatment. Finally, a total of 1,814 questions with 2,340 labels. Each question contains an average of 1.29 labels. We used Baidu Medical Encyclopedia as the knowledge resource. Two transformers BERT and RoBERTa were implemented to compare performance on our constructed datasets. Experimental results showed that our proposed model with knowledge infusion mechanism can achieve better performance, no matter which evaluation metric including Macro F1, Micro F1, Weighted F1 or Subset Accuracy were considered.

pdf bib
Generative Adversarial Networks based on Mixed-Attentions for Citation Intent Classification in Scientific Publications
Yuh-Shyang Wang | Chao-Yi Chen | Lung-Hao Lee
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

We propose the mixed-attention-based Generative Adversarial Network (named maGAN), and apply it for citation intent classification in scientific publication. We select domain-specific training data, propose a mixed-attention mechanism, and employ generative adversarial network architecture for pre-training language model and fine-tuning to the downstream multi-class classification task. Experiments were conducted on the SciCite datasets to compare model performance. Our proposed maGAN model achieved the best Macro-F1 of 0.8532.

pdf bib
NCU-NLP at ROCLING-2021 Shared Task: Using MacBERT Transformers for Dimensional Sentiment Analysis
Man-Chen Hung | Chao-Yi Chen | Pin-Jung Chen | Lung-Hao Lee
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

We use the MacBERT transformers and fine-tune them to ROCLING-2021 shared tasks using the CVAT and CVAS data. We compare the performance of MacBERT with the other two transformers BERT and RoBERTa in the valence and arousal dimensions, respectively. MAE and correlation coefficient (r) were used as evaluation metrics. On ROCLING-2021 test set, our used MacBERT model achieves 0.611 of MAE and 0.904 of r in the valence dimensions; and 0.938 of MAE and 0.549 of r in the arousal dimension.

pdf bib
NCUEE-NLP at MEDIQA 2021: Health Question Summarization Using PEGASUS Transformers
Lung-Hao Lee | Po-Han Chen | Yu-Xiang Zeng | Po-Lei Lee | Kuo-Kai Shyu
Proceedings of the 20th Workshop on Biomedical Language Processing

This study describes the model design of the NCUEE-NLP system for the MEDIQA challenge at the BioNLP 2021 workshop. We use the PEGASUS transformers and fine-tune the downstream summarization task using our collected and processed datasets. A total of 22 teams participated in the consumer health question summarization task of MEDIQA 2021. Each participating team was allowed to submit a maximum of ten runs. Our best submission, achieving a ROUGE2-F1 score of 0.1597, ranked third among all 128 submissions.

2020

pdf bib
Gated Graph Sequence Neural Networks for Chinese Healthcare Named Entity Recognition
Yi Lu | Lung-Hao Lee
Proceedings of the 32nd Conference on Computational Linguistics and Speech Processing (ROCLING 2020)

pdf bib
Scientific Writing Evaluation Using Ensemble Multi-channel Neural Networks
Yuh-Shyang Wang | Lung-Hao Lee | Bo-Lin Lin | Liang-Chih Yu
Proceedings of the 32nd Conference on Computational Linguistics and Speech Processing (ROCLING 2020)

pdf bib
International Journal of Computational Linguistics & {C}hinese Language Processing, Volume 25, Number 2, December 2020
Lung-Hao Lee | Kuan-Yu Chen
International Journal of Computational Linguistics & {C}hinese Language Processing, Volume 25, Number 2, December 2020

pdf bib
基於圖神經網路之中文健康照護命名實體辨識 (Chinese Healthcare Named Entity Recognition Based on Graph Neural Networks)
Yi Lu | Lung-Hao Lee
International Journal of Computational Linguistics & {C}hinese Language Processing, Volume 25, Number 2, December 2020

pdf bib
Medication Mention Detection in Tweets Using ELECTRA Transformers and Decision Trees
Lung-Hao Lee | Po-Han Chen | Hao-Chuan Kao | Ting-Chun Hung | Po-Lei Lee | Kuo-Kai Shyu
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

This study describes our proposed model design for the SMM4H 2020 Task 1. We fine-tune ELECTRA transformers using our trained SVM filter for data augmentation, along with decision trees to detect medication mentions in tweets. Our best F1-score of 0.7578 exceeded the mean score 0.6646 of all 15 submitting teams.

2019

pdf bib
NCUEE at MEDIQA 2019: Medical Text Inference Using Ensemble BERT-BiLSTM-Attention Model
Lung-Hao Lee | Yi Lu | Po-Han Chen | Po-Lei Lee | Kuo-Kai Shyu
Proceedings of the 18th BioNLP Workshop and Shared Task

This study describes the model design of the NCUEE system for the MEDIQA challenge at the ACL-BioNLP 2019 workshop. We use the BERT (Bidirectional Encoder Representations from Transformers) as the word embedding method to integrate the BiLSTM (Bidirectional Long Short-Term Memory) network with an attention mechanism for medical text inferences. A total of 42 teams participated in natural language inference task at MEDIQA 2019. Our best accuracy score of 0.84 ranked the top-third among all submissions in the leaderboard.

2018

pdf bib
Building a TOCFL Learner Corpus for Chinese Grammatical Error Diagnosis
Lung-Hao Lee | Yuen-Hsien Tseng | Li-Ping Chang
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Multilingual Short Text Responses Clustering for Mobile Educational Activities: a Preliminary Exploration
Yuen-Hsien Tseng | Lung-Hao Lee | Yu-Ta Chien | Chun-Yen Chang | Tsung-Yen Li
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

Text clustering is a powerful technique to detect topics from document corpora, so as to provide information browsing, analysis, and organization. On the other hand, the Instant Response System (IRS) has been widely used in recent years to enhance student engagement in class and thus improve their learning effectiveness. However, the lack of functions to process short text responses from the IRS prevents the further application of IRS in classes. Therefore, this study aims to propose a proper short text clustering module for the IRS, and demonstrate our implemented techniques through real-world examples, so as to provide experiences and insights for further study. In particular, we have compared three clustering methods and the result shows that theoretically better methods need not lead to better results, as there are various factors that may affect the final performance.

2017

pdf bib
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)
Yuen-Hsien Tseng | Hsin-Hsi Chen | Lung-Hao Lee | Liang-Chih Yu
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)

pdf bib
The NTNU System at SemEval-2017 Task 10: Extracting Keyphrases and Relations from Scientific Publications Using Multiple Conditional Random Fields
Lung-Hao Lee | Kuei-Ching Lee | Yuen-Hsien Tseng
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This study describes the design of the NTNU system for the ScienceIE task at the SemEval 2017 workshop. We use self-defined feature templates and multiple conditional random fields with extracted features to identify keyphrases along with categorized labels and their relations from scientific publications. A total of 16 teams participated in evaluation scenario 1 (subtasks A, B, and C), with only 7 teams competing in all sub-tasks. Our best micro-averaging F1 across the three subtasks is 0.23, ranking in the middle among all 16 submissions.

pdf bib
IJCNLP-2017 Task 1: Chinese Grammatical Error Diagnosis
Gaoqi Rao | Baolin Zhang | Endong Xun | Lung-Hao Lee
Proceedings of the IJCNLP 2017, Shared Tasks

This paper presents the IJCNLP 2017 shared task for Chinese grammatical error diagnosis (CGED) which seeks to identify grammatical error types and their range of occurrence within sentences written by learners of Chinese as foreign language. We describe the task definition, data preparation, performance metrics, and evaluation results. Of the 13 teams registered for this shared task, 5 teams developed the system and submitted a total of 13 runs. We expected this evaluation campaign could lead to the development of more advanced NLP techniques for educational applications, especially for Chinese error detection. All data sets with gold standards and scoring scripts are made publicly available to researchers.

pdf bib
IJCNLP-2017 Task 2: Dimensional Sentiment Analysis for Chinese Phrases
Liang-Chih Yu | Lung-Hao Lee | Jin Wang | Kam-Fai Wong
Proceedings of the IJCNLP 2017, Shared Tasks

This paper presents the IJCNLP 2017 shared task on Dimensional Sentiment Analysis for Chinese Phrases (DSAP) which seeks to identify a real-value sentiment score of Chinese single words and multi-word phrases in the both valence and arousal dimensions. Valence represents the degree of pleasant and unpleasant (or positive and negative) feelings, and arousal represents the degree of excitement and calm. Of the 19 teams registered for this shared task for two-dimensional sentiment analysis, 13 submitted results. We expected that this evaluation campaign could produce more advanced dimensional sentiment analysis techniques, especially for Chinese affective computing. All data sets with gold standards and scoring script are made publicly available to researchers.

2016

pdf bib
The NTNU-YZU System in the AESW Shared Task: Automated Evaluation of Scientific Writing Using a Convolutional Neural Network
Lung-Hao Lee | Bo-Lin Lin | Liang-Chih Yu | Yuen-Hsien Tseng
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Overview of NLP-TEA 2016 Shared Task for Chinese Grammatical Error Diagnosis
Lung-Hao Lee | Gaoqi Rao | Liang-Chih Yu | Endong Xun | Baolin Zhang | Li-Ping Chang
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)

This paper presents the NLP-TEA 2016 shared task for Chinese grammatical error diagnosis which seeks to identify grammatical error types and their range of occurrence within sentences written by learners of Chinese as foreign language. We describe the task definition, data preparation, performance metrics, and evaluation results. Of the 15 teams registered for this shared task, 9 teams developed the system and submitted a total of 36 runs. We expected this evaluation campaign could lead to the development of more advanced NLP techniques for educational applications, especially for Chinese error detection. All data sets with gold standards and scoring scripts are made publicly available to researchers.

pdf bib
Building Chinese Affective Resources in Valence-Arousal Dimensions
Liang-Chih Yu | Lung-Hao Lee | Shuai Hao | Jin Wang | Yunchao He | Jun Hu | K. Robert Lai | Xuejie Zhang
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
International Journal of Computational Linguistics & Chinese Language Processing, Volume 20, Number 1, June 2015-Special Issue on Chinese as a Foreign Language
Lung-Hao Lee | Liang-Chih Yu | Li-Ping Chang
International Journal of Computational Linguistics & Chinese Language Processing, Volume 20, Number 1, June 2015-Special Issue on Chinese as a Foreign Language

pdf bib
Guest Editoral: Special Issue on Chinese as a Foreign Language
Lung-Hao Lee | Liang-Chih Yu | Li-Ping Chang
International Journal of Computational Linguistics & Chinese Language Processing, Volume 20, Number 1, June 2015-Special Issue on Chinese as a Foreign Language

pdf bib
Introduction to SIGHAN 2015 Bake-off for Chinese Spelling Check
Yuen-Hsien Tseng | Lung-Hao Lee | Li-Ping Chang | Hsin-Hsi Chen
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing

pdf bib
Overview of the NLP-TEA 2015 Shared Task for Chinese Grammatical Error Diagnosis
Lung-Hao Lee | Liang-Chih Yu | Li-Ping Chang
Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications

2014

pdf bib
Chinese Open Relation Extraction for Knowledge Acquisition
Yuen-Hsien Tseng | Lung-Hao Lee | Shu-Yen Lin | Bo-Shun Liao | Mei-Jun Liu | Hsin-Hsi Chen | Oren Etzioni | Anthony Fader
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
Overview of SIGHAN 2014 Bake-off for Chinese Spelling Check
Liang-Chih Yu | Lung-Hao Lee | Yuen-Hsien Tseng | Hsin-Hsi Chen
Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
A Sentence Judgment System for Grammatical Error Detection
Lung-Hao Lee | Liang-Chih Yu | Kuei-Ching Lee | Yuen-Hsien Tseng | Li-Ping Chang | Hsin-Hsi Chen
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations

2013

pdf bib
Chinese Spelling Check Evaluation at SIGHAN Bake-off 2013
Shih-Hung Wu | Chao-Lin Liu | Lung-Hao Lee
Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing

2012

pdf bib
Traditional Chinese Parsing Evaluation at SIGHAN Bake-offs 2012
Yuen-Hsien Tseng | Lung-Hao Lee | Liang-Chih Yu
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
NTUSocialRec: An Evaluation Dataset Constructed from Microblogs for Recommendation Applications in Social Networks
Chieh-Jen Wang | Shuk-Man Cheng | Lung-Hao Lee | Hsin-Hsi Chen | Wen-shen Liu | Pei-Wen Huang | Shih-Peng Lin
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper proposes a method to construct an evaluation dataset from microblogs for the development of recommendation systems. We extract the relationships among three main entities in a recommendation event, i.e., who recommends what to whom. User-to-user friend relationships and user-to-resource interesting relationships in social media and resource-to-metadata descriptions in an external ontology are employed. In the experiments, the resources are restricted to visual entertainment media, movies in particular. A sequence of ground truths varying with time is generated. That reflects the dynamic of real world.

2009

pdf bib
Chinese WordNet Domains: Bootstrapping Chinese WordNet with Semantic Domain Labels
Lung-Hao Lee | Yu-Ting Yu | Chu-Ren Huang
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1

pdf bib
CWN-LMF: Chinese WordNet in the Lexical Markup Framework
Lung-Hao Lee | Shu-Kai Hsieh | Chu-Ren Huang
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

2008

pdf bib
Quality Assurance of Automatic Annotation of Very Large Corpora: a Study based on heterogeneous Tagging System
Chu-Ren Huang | Lung-Hao Lee | Wei-guang Qu | Jia-Fei Hong | Shiwen Yu
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We propose a set of heuristics for improving annotation quality of very large corpora efficiently. The Xinhua News portion of the Chinese Gigaword Corpus was tagged independently with both the Peking University ICL tagset and the Academia Sinica CKIP tagset. The corpus-based POS tags mapping will serve as the basis of the possible contrast in grammatical systems between PRC and Taiwan. And it can serve as the basic model for mapping between the CKIP and ICL tagging systems for any data.

pdf bib
Contrastive Approach towards Text Source Classification based on Top-Bag-of-Word Similarity
Chu-Ren Huang | Lung-Hao Lee
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation