2023
pdf
bib
A Novel Named Entity Recognition Model Applied to Specialized Sequence Labeling
Ruei-Cyuan Su
|
Tzu-En Su
|
Ming-Hsiang Su
|
Matus Pleva
|
Daniel Hladek
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)
pdf
bib
SCU-MESCLab at ROCLING-2023 Shared Task:Named Entity Recognition Using Multiple Classifier Model
Tzu-En Su
|
Ruei-Cyuan Su
|
Ming-Hsiang Su
|
Tsung-Hsien Yang
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)
2022
pdf
bib
abs
RoBERTa-based Traditional Chinese Medicine Named Entity Recognition Model
Ming-Hsiang Su
|
Chin-Wei Lee
|
Chi-Lun Hsu
|
Ruei-Cyuan Su
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
In this study, a named entity recognition was constructed and applied to the identification of Chinese medicine names and disease names. The results can be further used in a human-machine dialogue system to provide people with correct Chinese medicine medication reminders. First, this study uses web crawlers to sort out web resources into a Chinese medicine named entity corpus, collecting 1097 articles, 1412 disease names and 38714 Chinese medicine names. Then, we annotated each article using TCM name and BIO tagging method. Finally, this study trains and evaluates BERT, ALBERT, RoBERTa, GPT2 with BiLSTM and CRF. The experimental results show that RoBERTa’s NER system combining BiLSTM and CRF achieves the best system performance, with a precision rate of 0.96, a recall rate of 0.96, and an F1-score of 0.96.
pdf
bib
abs
SCU-MESCLab at ROCLING-2022 Shared Task: Named Entity Recognition Using BERT Classifier
Tsung-Hsien Yang
|
Ruei-Cyuan Su
|
Tzu-En Su
|
Sing-Seong Chong
|
Ming-Hsiang Su
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
In this study, named entity recognition is constructed and applied in the medical domain. Data is labeled in BIO format. For example, “muscle” would be labeled “B-BODY” and “I-BODY”, and “cough” would be “B-SYMP” and “I-SYMP”. All words outside the category are marked with “O”. The Chinese HealthNER Corpus contains 30,692 sentences, of which 2531 sentences are divided into the validation set (dev) for this evaluation, and the conference finally provides another 3204 sentences for the test set (test). We use BLSTM_CRF, Roberta+BLSTM_CRF and BERT Classifier to submit three prediction results respectively. Finally, the BERT Classifier system submitted as RUN3 achieved the best prediction performance, with an accuracy of 80.18%, a recall rate of 78.3%, and an F1-score of 79.23.
2021
pdf
bib
abs
SoochowDS at ROCLING-2021 Shared Task: Text Sentiment Analysis Using BERT and LSTM
Ruei-Cyuan Su
|
Sig-Seong Chong
|
Tzu-En Su
|
Ming-Hsiang Su
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)
In this shared task, this paper proposes a method to combine the BERT-based word vector model and the LSTM prediction model to predict the Valence and Arousal values in the text. Among them, the BERT-based word vector is 768-dimensional, and each word vector in the sentence is sequentially fed to the LSTM model for prediction. The experimental results show that the performance of our proposed method is better than the results of the Lasso Regression model.