Shangsong Liang

2025

MolRAG: Unlocking the Power of Large Language Models for Molecular Property Prediction
Ziting Xian | Jiawei Gu | Lingbo Li | Shangsong Liang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent LLMs exhibit limited effectiveness on molecular property prediction task due to the semantic gap between molecular representations and natural language, as well as the lack of domain-specific knowledge. To address these challenges, we propose MolRAG, a Retrieval-Augmented Generation framework integrating Chain-of-Thought reasoning for molecular property prediction. MolRAG operates by retrieving structurally analogous molecules as contextual references to guide stepwise knowledge reasoning through chemical structure-property relationships. This dual mechanism synergizes molecular similarity analysis with structured inference, while generating human-interpretable rationales grounded in domain knowledge. Experimental results show MolRAG outperforms pre-trained LLMs on four datasets, and even matches supervised methods, achieving performance gains of 1.1%–45.7% over direct prediction approaches, demonstrating versatile effectiveness. Our code is available at https://github.com/AcaciaSin/MolRAG.

pdf bib abs

MGM: Global Understanding of Audience Overlap Graphs for Predicting the Factuality and the Bias of News Media
Muhammad Arslan Manzoor | Ruihong Zeng | Dilshod Azizov | Preslav Nakov | Shangsong Liang
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

In the current era of rapidly growing digital data, evaluating the political bias and factuality of news outlets has become more important for seeking reliable information online. In this work, we study the classification problem of profiling news media from the lens of political bias and factuality. Traditional profiling methods, such as Pre-trained Language Models (PLMs) and Graph Neural Networks (GNNs) have shown promising results, but they face notable challenges. PLMs focus solely on textual features, causing them to overlook the complex relationships between entities, while GNNs often struggle with media graphs containing disconnected components and insufficient labels. To address these limitations, we propose MediaGraphMind (MGM), an effective solution within a variational Expectation-Maximization (EM) framework. Instead of relying on limited neighboring nodes, MGM leverages features, structural patterns, and label information from globally similar nodes. Such a framework not only enables GNNs to capture long-range dependencies for learning expressive node representations but also enhances PLMs by integrating structural information and therefore improving the performance of both models. The extensive experiments demonstrate the effectiveness of the proposed framework and achieve new state-of-the-art results. Further, we share our repository which contains the dataset, code, and documentation.

pdf bib abs

L4: Mutual Learning Helps Lifelong Language Learning
Jiyong Li | Dilshod Azizov | Shangsong Liang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Adapting language models to learn continuously from data streams while retaining previous knowledge is a key challenge in artificial intelligence (AI), particularly in lifelong language learning. Existing distillation methods are based on offline techniques, limiting their ability to update in real-time and adapt to dynamic environments. To address this, we propose online dynamic mutual distillation - a novel framework that enables continuous mutual learning from task streams without relying on domain-specific teachers. To our knowledge, this is the first application of mutual learning in lifelong language learning, providing dynamic knowledge transfer without domain-specific teachers. Moreover, our extensive experiments demonstrate that the proposed method reduces catastrophic forgetting, while improving task performance on various benchmark datasets making it suitable for real-world, dynamic natural language processing (NLP) applications such as adaptive chatbots and personalized language systems. We will make our code publicly available upon acceptance.

pdf bib abs

Speculative Reward Model Boosts Decision Making Ability of LLMs Cost-Effectively
Jiawei Gu | Shangsong Liang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)

Effective decision-making in Large Language Models (LLMs) is essential for handling intricate tasks. However, existing approaches prioritize performance but often overlook the balance between effectiveness and computational cost. To address this, we first introduce the 3E Criteria to systematically assess the cost-effectiveness of search strategies, revealing that existing methods often trade significant efficiency for marginal performance gains. To improve LLM decision-making while maintaining efficiency, we propose the Speculative Reward Model (SRM), a plug-and-play framework that seamlessly integrates with existing search strategies. Specifically, SRM employs an external reward assigner to predict optimal actions, reducing reliance on LLMs’ internal self-evaluation. And a speculative verification mechanism is used to prune suboptimal choices and guide the search toward more promising steps. We evaluate SRM on several complex decision-making tasks including mathematical reasoning, planning and numerical reasoning in specialized domains. Experimental results show that SRM reduces costs to 1/10 of the original search framework on average while maintaining effectiveness.

2024

pdf bib abs

SAFARI: Cross-lingual Bias and Factuality Detection in News Media and News Articles
Dilshod Azizov | Zain Muhammad Mujahid | Hilal AlQuabeh | Preslav Nakov | Shangsong Liang
Findings of the Association for Computational Linguistics: EMNLP 2024

In an era where information is quickly shared across many cultural and language contexts, the neutrality and integrity of news media are essential. Ensuring that media content remains unbiased and factual is crucial for maintaining public trust. With this in mind, we introduce SAFARI (CroSs-lingual BiAs and Factuality Detection in News MediA and News ARtIcles), a novel corpus of news media and articles for predicting political bias and the factuality of reporting in a multilingual and cross-lingual setup. To the best of our knowledge, this corpus is unprecedented in its collection and introduces a dataset for political bias and factuality for three tasks: (i) media-level, (ii) article-level, and (iii) joint modeling at the article-level. At the media and article levels, we evaluate the cross-lingual ability of the models; however, in joint modeling, we evaluate on English data. Our frameworks set a new benchmark in the cross-lingual evaluation of political bias and factuality. This is achieved through the use of various Multilingual Pre-trained Language Models (MPLMs) and Large Language Models (LLMs) coupled with ensemble learning methods.

2023

pdf bib abs

Frank at NADI 2023 Shared Task: Trio-Based Ensemble Approach for Arabic Dialect Identification
Dilshod Azizov | Jiyong Li | Shangsong Liang
Proceedings of ArabicNLP 2023

We present our system designed for Subtask 1 in the shared task NADI on Arabic Dialect Identification, which is part of ArabicNLP 2023. In our approach, we utilized models such as: MARBERT, MARBERTv2 (A) and MARBERTv2 (B). Subsequently, we created a majority voting ensemble of these models. We used MARBERTv2 with different hyperparameters, which significantly improved the overall performance of the ensemble model. In terms of performance, our systems achieved a competitive an F1 score of 84.76. Overall, our system secured the 5th position out of 16 participating teams.

pdf bib abs

Lotus at WojoodNER Shared Task: Multilingual Transformers: Unveiling Flat and Nested Entity Recognition
Jiyong Li | Dilshod Azizov | Hilal AlQuabeh | Shangsong Liang
Proceedings of ArabicNLP 2023

We introduce our systems developed for two subtasks in the shared task “Wojood” on Arabic NER detection, part of ArabicNLP 2023. For Subtask 1, we employ the XLM-R model to predict Flat NER labels for given tokens using a single classifier capable of categorizing all labels. For Subtask 2, we use the XLM-R encoder by building 21 individual classifiers. Each classifier corresponds to a specific label and is designed to determine the presence of its respective label. In terms of performance, our systems achieved competitive micro-F1 scores of 0.83 for Subtask 1 and 0.76 for Subtask 2, according to the leaderboard scores.

pdf bib abs

Frank at ArAIEval Shared Task: Arabic Persuasion and Disinformation: The Power of Pretrained Models
Dilshod Azizov | Jiyong Li | Shangsong Liang
Proceedings of ArabicNLP 2023

In this work, we present our systems developed for “ArAIEval” shared task of ArabicNLP 2023 (CITATION). We used an mBERT transformer for Subtask 1A, which targets persuasion in Arabic tweets, and we used the MARBERT transformer for Subtask 2A to identify disinformation in Arabic tweets. Our persuasion detection system achieved micro-F1 of 0.745 by surpassing the baseline by 13.2%, and registered a macro-F1 of 0.717 based on leaderboard scores. Similarly, our disinformation system recorded a micro-F1 of 0.816, besting the naïve majority by 6.7%, with a macro-F1 of 0.637. Furthermore, we present our preliminary results on a variety of pre-trained models. In terms of overall ranking, our systems placed 7th out of 16 and 12th out of 17 teams for Subtasks 1A and 2A, respectively.

2022

pdf bib abs

Revisiting Parameter-Efficient Tuning: Are We Really There Yet?
Guanzheng Chen | Fangyu Liu | Zaiqiao Meng | Shangsong Liang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Parameter-Efficient Tuning (PETuning) methods have been deemed by many as the new paradigm for using pretrained language models (PLMs). By tuning just a fraction amount of parameters comparing to full model finetuning, PETuning methods claim to have achieved performance on par with or even better than finetuning. In this work, we take a step back and re-examine these PETuning methods by conducting the first comprehensive investigation into the training and evaluation of them. We found the problematic validation and testing practice in current studies, when accompanied by the instability nature of PETuning methods, has led to unreliable conclusions. When being compared under a truly fair evaluation protocol, PETuning cannot yield consistently competitive performance while finetuning remains to be the best-performing method in medium- and high-resource settings. We delve deeper into the cause of the instability and observed that the number of trainable parameters and training iterations are two main factors: reducing trainable parameters and prolonging training iterations may lead to higher stability in PETuning methods.

Co-authors

Guanzheng Chen 1

Lingbo Li 1

Fangyu Liu 1

Muhammad Arslan Manzoor 1

Zaiqiao Meng 1

Zain Muhammad Mujahid 1

Ziting Xian 1

Ruihong Zeng 1

Venues

NAACL1

Fix author