Junlong Wang
2024
DUTIR938 at SemEval-2024 Task 4: Semi-Supervised Learning and Model Ensemble for Persuasion Techniques Detection in Memes
Erchen Yu
|
Junlong Wang
|
Xuening Qiao
|
Jiewei Qi
|
Zhaoqing Li
|
Hongfei Lin
|
Linlin Zong
|
Bo Xu
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
The development of social platforms has facilitated the proliferation of disinformation, with memes becoming one of the most popular types of propaganda for disseminating disinformation on the internet. Effectively detecting the persuasion techniques hidden within memes is helpful in understanding user-generated content and further promoting the detection of disinformation on the internet. This paper demonstrates the approach proposed by Team DUTIR938 in Subtask 2b of SemEval-2024 Task 4. We propose a dual-channel model based on semi-supervised learning and model ensemble. We utilize CLIP to extract image features, and employ various pretrained language models under task-adaptive pretraining for text feature extraction. To enhance the detection and generalization capabilities of the model, we implement sample data augmentation using semi-supervised pseudo-labeling methods, introduce adversarial training strategies, and design a two-stage global model ensemble strategy. Our proposed method surpasses the provided baseline method, with Macro/Micro F1 values of 0.80910/0.83667 in the English leaderboard. Our submission ranks 3rd/19 in terms of Macro F1 and 1st/19 in terms of Micro F1.
CoT-based Data Augmentation Strategy for Persuasion Techniques Detection
Dailin Li
|
Chuhan Wang
|
Xin Zou
|
Junlong Wang
|
Peng Chen
|
Jian Wang
|
Liang Yang
|
Hongfei Lin
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Detecting persuasive communication is an important topic in Natural Language Processing (NLP), as it can be useful in identifying fake information on social media. We have developed a system to identify applied persuasion techniques in text fragments across four languages: English, Bulgarian, North Macedonian, and Arabic. Our system uses data augmentation methods and employs an ensemble strategy that combines the strengths of both RoBERTa and DeBERTa models. Due to limited resources, we concentrated solely on task 1, and our solution achieved the top ranking in the English track during the official assessments. We also analyse the impact of architectural decisions, data constructionand training strategies.
Search
Fix data
Co-authors
- Hongfei Lin (林鸿飞) 2
- Peng Chen 1
- Zhaoqing Li 1
- Dailin Li 1
- Jiewei Qi 1
- show all...