Suparna De

2025

PMPO: A Self-Optimizing Framework for Creating High-Fidelity Measurement Tools for Social Bias in Large Language Models
Zeqiang Wang | Yuqi Wang | Xinyue Wu | Chenxi Li | Yiran Liu | Linghan Ge | Zhan Yu | Jiaxin Shi | Suparna De
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

The potential of Large Language Models (LLMs) as instruments for measuring social phenomena is constrained by the methodological limitations of current probing techniques. Prevailing methods rely on static, handcrafted probe sets whose quality is highly dependent on their authors’ subjective expertise. This results in measurement tools with inconsistent statistical reliability that defy systematic optimization. Such an “artisanal” approach, akin to using an “uneven ruler,” undermines the scientific rigor of its findings and severely limits the applicability of LLMs in the social sciences. To elevate bias measurement from a craft to a science, we introduce the Psychometric-driven Probe Optimization (PMPO) framework. This framework treats a probe set as an optimizable scientific instrument and, for the first time, utilizes a Neural Genetic Algorithm that leverages a powerful LLM as a “neural genetic operator.” Through a hybrid strategy of gradient-guided mutation and creative rephrasing, PMPO automatically enhances the probe set’s reliability, sensitivity, and diversity. We first establish the external validity of our foundational measurement method (PLC), demonstrating a high correlation between its measurement of occupational gender bias and real-world U.S. Bureau of Labor Statistics data (average Pearson’s r=0.83, p<.001). Building on this, we show that the PMPO framework can elevate a standard probe set’s internal consistency (Cronbach’s Alpha) from 0.90 to an exceptional 0.96 within 10 generations. Critically, in a rigorous, double-blind “Turing Test,” probes evolved by PMPO from non-expert seeds were judged by sociology experts to have achieved a level of quality, sophistication, and nuance that is comparable to, and even indistinguishable from, those handcrafted by domain experts. This work provides a systematic pathway to upgrade LLM measurement tools from artisanal artifacts to automated scientific instruments, offering an unprecedented and trustworthy tool for AI safety auditing and computational social science.

pdf bib abs

DLIR: Spherical Adaptation for Cross-Lingual Knowledge Transfer of Sociological Concepts Alignment
Zeqiang Wang | Jon Johnson | Suparna De
Findings of the Association for Computational Linguistics: EMNLP 2025

Cross-lingual alignment of nuanced sociological concepts is crucial for comparative cross-cultural research, harmonising longitudinal studies, and leveraging knowledge from social science taxonomies (e.g., ELSST). However, aligning these concepts is challenging due to cultural context-dependency, linguistic variation, and data scarcity, particularly for low-resource languages. Existing methods often fail to capture domain-specific subtleties or require extensive parallel data. Grounded in a Vector Decomposition Hypothesis—positing separable domain and language components within embeddings, supported by observed language-pair specific geometric structures—we propose DLIR (Dual-Branch LoRA for Invariant Representation). DLIR employs parallel Low-Rank Adaptation (LoRA) branches: one captures core sociological semantics (trained primarily on English data structured by the ELSST hierarchy), while the other learns language invariance by counteracting specific language perturbations. These perturbations are modeled by Gaussian Mixture Models (GMMs) fitted on minimal parallel concept data using spherical geometry. DLIR significantly outperforms strong baselines on cross-lingual sociological concept retrieval across 10 languages. Demonstrating powerful zero-shot knowledge transfer, English-trained DLIR substantially surpasses target-language (French/German) LoRA fine-tuning even in monolingual tasks. DLIR learns disentangled, language-robust representations, advancing resource-efficient multilingual understanding and enabling reliable cross-lingual comparison of sociological constructs.

2024

pdf bib abs

Social media is recognized as an important source for deriving insights into public opinion dynamics and social impacts due to the vast textual data generated daily and the ‘unconstrained’ behavior of people interacting on these platforms. However, such analyses prove challenging due to the semantic shift phenomenon, where word meanings evolve over time. This paper proposes an unsupervised dynamic word embedding method to capture longitudinal semantic shifts in social media data without predefined anchor words. The method leverages word co-occurrence statistics and dynamic updating to adapt embeddings over time, addressing the challenges of data sparseness, imbalanced distributions, and synergistic semantic effects. Evaluated on a large COVID-19 Twitter dataset, the method reveals semantic evolution patterns of vaccine- and symptom-related entities across different pandemic stages, and their potential correlations with real-world statistics. Our key contributions include the dynamic embedding technique, empirical analysis of COVID-19 semantic shifts, and discussions on enhancing semantic shift modeling for computational social science research. This study enables capturing longitudinal semantic dynamics on social media to understand public discourse and collective phenomena.

pdf bib abs

DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness
Yuqi Wang | Zeqiang Wang | Wei Wang | Qi Chen | Kaizhu Huang | Anh Nguyen | Suparna De
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Safe and reliable natural language inference is critical for extracting insights from clinical trial reports but poses challenges due to biases in large pre-trained language models. This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials. By generating synthetic examples through semantic perturbations and domain-specific vocabulary replacement and adding a new task for numerical and quantitative reasoning, we introduce greater diversity and reduce shortcut learning. Our approach, combined with multi-task learning and the DeBERTa architecture, achieved significant performance gains on the NLI4CT 2024 benchmark compared to the original language models. Ablation studies validate the contribution of each augmentation method in improving robustness. Our best-performing model ranked 12th in terms of faithfulness and 8th in terms of consistency, respectively, out of the 32 participants.

2023

pdf bib abs

Prompt-based Zero-shot Text Classification with Conceptual Knowledge
Yuqi Wang | Wei Wang | Qi Chen | Kaizhu Huang | Anh Nguyen | Suparna De
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

In recent years, pre-trained language models have garnered significant attention due to their effectiveness, which stems from the rich knowledge acquired during pre-training. To mitigate the inconsistency issues between pre-training tasks and downstream tasks and to facilitate the resolution of language-related issues, prompt-based approaches have been introduced, which are particularly useful in low-resource scenarios. However, existing approaches mostly rely on verbalizers to translate the predicted vocabulary to task-specific labels. The major limitations of this approach are the ignorance of potentially relevant domain-specific words and being biased by the pre-training data. To address these limitations, we propose a framework that incorporates conceptual knowledge for text classification in the extreme zero-shot setting. The framework includes prompt-based keyword extraction, weight assignment to each prompt keyword, and final representation estimation in the knowledge graph embedding space. We evaluated the method on four widely-used datasets for sentiment analysis and topic detection, demonstrating that it consistently outperforms recently-developed prompt-based approaches in the same experimental settings.

2022

pdf bib abs

Privacy Pitfalls of Online Service Terms and Conditions: a Hybrid Approach for Classification and Summarization
Emilia Lukose | Suparna De | Jon Johnson
Proceedings of the Natural Legal Language Processing Workshop 2022

Verbose and complicated legal terminology in online service terms and conditions (T&C) means that users typically don’t read these documents before accepting the terms of such unilateral service contracts. With such services becoming part of mainstream digital life, highlighting Terms of Service (ToS) clauses that impact on the collection and use of user data and privacy are important concerns. Advances in text summarization can help to create informative and concise summaries of the terms, but existing approaches geared towards news and microblogging corpora are not directly applicable to the ToS domain, which is hindered by a lack of T&C-relevant resources for training and evaluation. This paper presents a ToS model, developing a hybrid extractive-classifier-abstractive pipeline that highlights the privacy and data collection/use-related sections in a ToS document and paraphrases these into concise and informative sentences. Relying on significantly less training data (4313 training pairs) than previous representative works (287,226 pairs), our model outperforms extractive baselines by at least 50% in ROUGE-1 score and 54% in METEOR score. The paper also contributes to existing community efforts by curating a dataset of online service T&C, through a developed web scraping tool.

Co-authors

Zhan Yu 1

Venues

SemEval1

Fix author