2024
pdf
bib
abs
SAFARI: Cross-lingual Bias and Factuality Detection in News Media and News Articles
Dilshod Azizov
|
Zain Muhammad Mujahid
|
Hilal AlQuabeh
|
Preslav Nakov
|
Shangsong Liang
Findings of the Association for Computational Linguistics: EMNLP 2024
In an era where information is quickly shared across many cultural and language contexts, the neutrality and integrity of news media are essential. Ensuring that media content remains unbiased and factual is crucial for maintaining public trust. With this in mind, we introduce SAFARI (CroSs-lingual BiAs and Factuality Detection in News MediA and News ARtIcles), a novel corpus of news media and articles for predicting political bias and the factuality of reporting in a multilingual and cross-lingual setup. To the best of our knowledge, this corpus is unprecedented in its collection and introduces a dataset for political bias and factuality for three tasks: (i) media-level, (ii) article-level, and (iii) joint modeling at the article-level. At the media and article levels, we evaluate the cross-lingual ability of the models; however, in joint modeling, we evaluate on English data. Our frameworks set a new benchmark in the cross-lingual evaluation of political bias and factuality. This is achieved through the use of various Multilingual Pre-trained Language Models (MPLMs) and Large Language Models (LLMs) coupled with ensemble learning methods.
2023
pdf
bib
abs
Frank at ArAIEval Shared Task: Arabic Persuasion and Disinformation: The Power of Pretrained Models
Dilshod Azizov
|
Jiyong Li
|
Shangsong Liang
Proceedings of ArabicNLP 2023
In this work, we present our systems developed for “ArAIEval” shared task of ArabicNLP 2023 (CITATION). We used an mBERT transformer for Subtask 1A, which targets persuasion in Arabic tweets, and we used the MARBERT transformer for Subtask 2A to identify disinformation in Arabic tweets. Our persuasion detection system achieved micro-F1 of 0.745 by surpassing the baseline by 13.2%, and registered a macro-F1 of 0.717 based on leaderboard scores. Similarly, our disinformation system recorded a micro-F1 of 0.816, besting the naïve majority by 6.7%, with a macro-F1 of 0.637. Furthermore, we present our preliminary results on a variety of pre-trained models. In terms of overall ranking, our systems placed 7th out of 16 and 12th out of 17 teams for Subtasks 1A and 2A, respectively.
pdf
bib
abs
Frank at NADI 2023 Shared Task: Trio-Based Ensemble Approach for Arabic Dialect Identification
Dilshod Azizov
|
Jiyong Li
|
Shangsong Liang
Proceedings of ArabicNLP 2023
We present our system designed for Subtask 1 in the shared task NADI on Arabic Dialect Identification, which is part of ArabicNLP 2023. In our approach, we utilized models such as: MARBERT, MARBERTv2 (A) and MARBERTv2 (B). Subsequently, we created a majority voting ensemble of these models. We used MARBERTv2 with different hyperparameters, which significantly improved the overall performance of the ensemble model. In terms of performance, our systems achieved a competitive an F1 score of 84.76. Overall, our system secured the 5th position out of 16 participating teams.
pdf
bib
abs
Lotus at WojoodNER Shared Task: Multilingual Transformers: Unveiling Flat and Nested Entity Recognition
Jiyong Li
|
Dilshod Azizov
|
Hilal AlQuabeh
|
Shangsong Liang
Proceedings of ArabicNLP 2023
We introduce our systems developed for two subtasks in the shared task “Wojood” on Arabic NER detection, part of ArabicNLP 2023. For Subtask 1, we employ the XLM-R model to predict Flat NER labels for given tokens using a single classifier capable of categorizing all labels. For Subtask 2, we use the XLM-R encoder by building 21 individual classifiers. Each classifier corresponds to a specific label and is designed to determine the presence of its respective label. In terms of performance, our systems achieved competitive micro-F1 scores of 0.83 for Subtask 1 and 0.76 for Subtask 2, according to the leaderboard scores.
2022
pdf
bib
abs
Revisiting Parameter-Efficient Tuning: Are We Really There Yet?
Guanzheng Chen
|
Fangyu Liu
|
Zaiqiao Meng
|
Shangsong Liang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Parameter-Efficient Tuning (PETuning) methods have been deemed by many as the new paradigm for using pretrained language models (PLMs). By tuning just a fraction amount of parameters comparing to full model finetuning, PETuning methods claim to have achieved performance on par with or even better than finetuning. In this work, we take a step back and re-examine these PETuning methods by conducting the first comprehensive investigation into the training and evaluation of them. We found the problematic validation and testing practice in current studies, when accompanied by the instability nature of PETuning methods, has led to unreliable conclusions. When being compared under a truly fair evaluation protocol, PETuning cannot yield consistently competitive performance while finetuning remains to be the best-performing method in medium- and high-resource settings. We delve deeper into the cause of the instability and observed that the number of trainable parameters and training iterations are two main factors: reducing trainable parameters and prolonging training iterations may lead to higher stability in PETuning methods.