Chenghao Huang
2022
zydhjh4593@SMM4H’22: A Generic Pre-trained BERT-based Framework for Social Media Health Text Classification
Chenghao Huang
|
Xiaolu Chen
|
Yuxi Chen
|
Yutong Wu
|
Weimin Yuan
|
Yan Wang
|
Yanru Zhang
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task
This paper describes our proposed framework for the 10 text classification tasks of Task 1a, 2a, 2b, 3a, 4, 5, 6, 7, 8, and 9, in the Social Media Mining for Health (SMM4H) 2022. According to the pre-trained BERT-based models, various techniques, including regularized dropout, focal loss, exponential moving average, 5-fold cross-validation, ensemble prediction, and pseudo-labeling, are applied for further formulating and improving the generalization performance of our framework. In the evaluation, the proposed framework achieves the 1st place in Task 3a with a 7% higher F1-score than the median, and obtains a 4% higher averaged F1-score than the median in all participating tasks except Task 1a.
2020
Ferryman at SemEval-2020 Task 7: Ensemble Model for Assessing Humor in Edited News Headlines
Weilong Chen
|
Jipeng Li
|
Chenghao Huang
|
Wei Bai
|
Yanru Zhang
|
Yan Wang
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Natural language processing (NLP) has been applied to various fields including text classification and sentiment analysis. In the shared task of assessing the funniness of edited news headlines, which is a part of the SemEval 2020 competition, we preprocess datasets by replacing abbreviation, stemming words, then merge three models including Light Gradient Boosting Machine (LightGBM), Long Short-Term Memory (LSTM), and Bidirectional Encoder Representation from Transformer (BERT) by taking the average to perform the best. Our team Ferryman wins the 9th place in Sub-task 1 of Task 7 - Regression.
MeisterMorxrc at SemEval-2020 Task 9: Fine-Tune Bert and Multitask Learning for Sentiment Analysis of Code-Mixed Tweets
Qi Wu
|
Peng Wang
|
Chenghao Huang
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Natural language processing (NLP) has been applied to various fields including text classification and sentiment analysis. In the shared task of sentiment analysis of code-mixed tweets, which is a part of the SemEval-2020 competition, we preprocess datasets by replacing emoji and deleting uncommon characters and so on, and then fine-tune the Bidirectional Encoder Representation from Transformers(BERT) to perform the best. After exhausting top3 submissions, Our team MeisterMorxrc achieves an averaged F1 score of 0.730 in this task, and and our codalab username is MeisterMorxrc