Jheng-Long Wu


2021

pdf bib
A Corpus for Dimensional Sentiment Classification on YouTube Streaming Service
Ching-Wen Hsu | Chun-Lin Chou | Hsuan Liu | Jheng-Long Wu
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

The streaming service platform such as YouTube provides a discussion function for audiences worldwide to share comments. YouTubers who upload videos to the YouTube platform want to track the performance of these uploaded videos. However, the present analysis functions of YouTube only provide a few performance indicators such as average view duration, browsing history, variance in audience’s demographics, etc., and lack of sentiment analysis on the audience’s comments. Therefore, the paper proposes multi-dimensional sentiment indicators such as YouTuber preference, Video preferences, and Excitement level to capture comprehensive sentiment on audience comments for videos and YouTubers. To evaluate the performance of different classifiers, we experiment with deep learning-based, machine learning-based, and BERT-based classifiers to automatically detect three sentiment indicators of an audience’s comments. Experimental results indicate that the BERT-based classifier is a better classification model than other classifiers according to F1-score, and the sentiment indicator of Excitement level is quite an improvement. Therefore, the multiple sentiment detection tasks on the video streaming service platform can be solved by the proposed multi-dimensional sentiment indicators accompanied with BERT classifier to gain the best result.

pdf bib
Confiscation Detection of Criminal Judgment Using Text Classification Approach
Hsuan-Tzu Shih | Yu-Cheng Chiu | Hsiao-Shih Chen | Jheng-Long Wu
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

As the system of confiscation becomes more and more perfect, grasping the distribution of the types of confiscations actually announced by the courts will enable you to understand changing of the trend. In addition to assisting legislators in formulating laws, it can also provide other people with an understanding of the actual operation of the confiscation system. In order to enable artificial intelligence technology to automatically identify the distribution of confiscation, and consumes a lot of manpower and time costs of manual judgment. The purpose of this research is to establish an automated confiscation identification model that can quickly and accurately identify the multiple label categories of confiscation, and provide the needs of all social circles for confiscation information, so as to facilitate subsequent law amendments or discretion. This research uses the first instance criminal cases as the main experimental data. According to the current laws, the confiscation is divided into three categories: contrabands, criminal tools and criminal proceeds, and perform multiple label identification. This research will use Term Frequency–Inverse Document Frequency (TF-IDF) and Word2Vec algorithm as the feature extraction algorithm, with random forest classifier, and CKIPlabBERT pretrained model for training and identification. The experimental results show that under the CKIPlabBERT pretrained model, the best identification effect can be obtained when only use sentences with confiscated words mentioned in the judgment. When the task is case confiscation, the Micro F1 Score can be as high as 96.2716%, and when the task is defendant confiscation, the Micro F1 Score is as high as 95.5478%.

pdf bib
SCUDS at ROCLING-2021 Shared Task: Using Pretrained Model for Dimensional Sentiment Analysis Based on Sample Expansion Method
Hsiao-Shih Chen | Pin-Chiung Chen | Shao-Cheng Huang | Yu-Cheng Chiu | Jheng-Long Wu
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

Sentiment analysis has become a popular research issue in recent years, especially on educational texts which is an important problem. According to literature, the similar sentence generation can help the prediction performance of machine learning. Therefore, the process of controlled expansional samples is a key component to prediction models. The paper proposed a sample expansion method which combined part-of-speech filter and similar word finder of Word2Vec. The generate samples have high quality with similar sentiment representation. The DistilBERT pretrained model is used to learn and predict Valence-Arousal scores from the expansion samples. Experimental result displays that the using the expansion samples as training data into prediction model has outperforms original training data without expansion, and obtains 80% mean square error reducing and 28% pearson correlation coefficient increasing.

pdf bib
A Pretrained YouTuber Embeddings for Improving Sentiment Classification of YouTube Comments
Ching-Wen Hsu | Hsuan Liu | Jheng-Long Wu
International Journal of Computational Linguistics & Chinese Language Processing, Volume 26, Number 2, December 2021

2020

pdf bib
Building A Multi-Label Detection Model for Question classification of Auction Website
I-Ju Lin | Jheng-Long Wu
Proceedings of the 32nd Conference on Computational Linguistics and Speech Processing (ROCLING 2020)