Abhishek Kumar Mishra

2026

The Model’s Language Matters: A Comparative Privacy Analysis of LLMs
Abhishek Kumar Mishra | Antoine Boutet | Lucas Magnana
Findings of the Association for Computational Linguistics: EACL 2026

Large Language Models (LLMs) are increasingly deployed in multilingual settings that process sensitive data, yet their scale and linguistic variability can amplify privacy risks. While prior privacy evaluations focus predominantly on English, we investigate how language structure shapes privacy leakage in LLMs trained on English, Spanish, French, and Italian medical corpora. We quantify six corpus-level linguistic indicators and evaluate vulnerability under three attack families: extraction, counterfactual memorization, and membership inference. Across languages, we find that leakage systematically tracks structural properties: Italian exhibits the strongest exposure, consistent with its highest redundancy and longer lexical units, whereas English shows the clearest membership separability, aligning with its higher syntactic entropy and stronger surface-identifiable cues. In contrast, French and Spanish remain comparatively more resilient overall, aided by higher morphological complexity. These results provide quantitative evidence that language matters for privacy leakage, motivating language-aware and structure-adaptive privacy-preserving mechanisms for multilingual LLM deployments.

2021

pdf bib abs

Does local pruning offer task-specific models to learn effectively ?
Abhishek Kumar Mishra | Mohna Chakraborty
Proceedings of the Student Research Workshop Associated with RANLP 2021

The need to deploy large-scale pre-trained models on edge devices under limited computational resources has led to substantial research to compress these large models. However, less attention has been given to compress the task-specific models. In this work, we investigate the different methods of unstructured pruning on task-specific models for Aspect-based Sentiment Analysis (ABSA) tasks. Specifically, we analyze differences in the learning dynamics of pruned models by using the standard pruning techniques to achieve high-performing sparse networks. We develop a hypothesis to demonstrate the effectiveness of local pruning over global pruning considering a simple CNN model. Later, we utilize the hypothesis to demonstrate the efficacy of the pruned state-of-the-art model compared to the over-parameterized state-of-the-art model under two settings, the first considering the baselines for the same task used for generating the hypothesis, i.e., aspect extraction and the second considering a different task, i.e., sentiment analysis. We also provide discussion related to the generalization of the pruning hypothesis.

Co-authors

Venues

Findings1
RANLP1

Fix author