Viet Anh Nguyen

2025

Distributional Surgery for Language Model Activations
Bao Nguyen | Binh Nguyen | Duy Nguyen | Viet Anh Nguyen
Findings of the Association for Computational Linguistics: EMNLP 2025

Language models, while capable of generating remarkably coherent and seemingly accurate text, can occasionally produce undesirable content including harmful or toxic outputs. In this paper, we present a new two-stage approach to detect and mitigate undesirable content generations by rectifying activations. First, we train an ensemble of layerwise classifiers to detect undesirable content using activations by minimizing a smooth surrogate of the risk-aware score. Then, for detected undesirable contents, we propose layerwise distributional steering policies that transform the attention heads. These policies are computed through principled semidefinite programming aims to minimally perturb the attention distribution while probabilistically guaranteeing the effectiveness of the editions. Empirical evaluations across multiple language models and datasets show that our method outperforms baselines in reducing the generation of undesirable output.

pdf bib abs

Probe-Free Low-Rank Activation Intervention
Chonghe Jiang | Bao Nguyen | Anthony Man-Cho So | Viet Anh Nguyen
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Language models (LMs) can produce texts that appear accurate and coherent but contain untruthful or toxic content. Inference-time interventions that edit the hidden activations have shown promising results in steering the LMs towards desirable generations. Existing activation intervention methods often comprise an activation probe to detect undesirable generation, triggering the activation modification to steer subsequent generation. This paper proposes a probe-free intervention method FLORAIN for all attention heads in a specific activation layer. It eliminates the need to train classifiers for probing purposes. The intervention function is parametrized by a sample-wise nonlinear low-rank mapping, which is trained by minimizing the distance between the modified activations and their projection onto the manifold of desirable content. Under specific constructions of the manifold and projection distance, we show that the intervention strategy can be computed efficiently by solving a smooth optimization problem. The empirical results, benchmarked on multiple base models, demonstrate that FLORAIN consistently outperforms several baseline methods in enhancing model truthfulness and quality across generation and multiple-choice tasks. Our implementation can be found at https://github.com/nguyenngocbaocmt02/EFI.

pdf bib abs

Advances in Large Language Models (LLMs) paved the way for their emerging applications in various domains, such as human behavior simulations, where LLMs could augment human-generated data in social science research and machine learning model training. However, pretrained LLMs often fail to capture the behavioral diversity of target populations due to the inherent variability across individuals and groups. To address this, we propose Mixture of Personas (MoP), a probabilistic prompting method that aligns LLM responses with the target population. MoP is a contextual mixture model, where each component is an LM agent characterized by a persona and an exemplar that represents the behaviors of subpopulation. The persona and the exemplar are randomly chosen according to the learned mixing weights to elicit diverse LLM responses during simulation. MoP is flexible, does not require model fine-tuning, and is transferable between base models. Experiments for synthetic data generation show that MoP outperforms competing methods in alignment and diversity metrics.

pdf bib abs

Task-driven Layerwise Additive Activation Intervention
Hieu Trung Nguyen | Bao Nguyen | Binh Nguyen | Viet Anh Nguyen
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

Modern language models (LMs) have significantly advanced generative modeling in natural language processing (NLP). Despite their success, LMs often struggle with adaptation to new contexts in real-time applications. A promising approach to task adaptation is activation intervention, which steers the LMs’ generation process by identifying and manipulating the activations. However, existing interventions rely heavily on heuristic rules or require many prompt inputs to determine effective interventions. In this paper, we propose a layer-wise additive activation intervention framework that optimizes the intervention process, thereby enhancing sample efficiency. We evaluate our framework on various datasets, demonstrating improvements in the accuracy of pretrained LMs and competing intervention baselines.

pdf bib abs

Structured Pruning for Diverse Best-of-N Reasoning Optimization
Hieu Trung Nguyen | Bao Nguyen | Viet Anh Nguyen
Findings of the Association for Computational Linguistics: ACL 2025

Model pruning in transformer-based language models, traditionally seen as a means of computational savings, can enhance the model’s reasoning capabilities. In this work, we uncover the surprising phenomenon that the selective pruning of certain attention heads leads to improvements in reasoning performance, particularly on challenging tasks. Motivated by this observation, we propose SPRINT, a novel contrastive learning framework that dynamically selects the optimal head and layer to prune during inference. By aligning question embeddings with head embeddings, our approach identifies those pruned-head configurations that result in more accurate reasoning. Extensive experiments on the MATH dataset demonstrate that our method significantly outperforms traditional best-of-N and random head selection strategies on the MATH500 and GSM8K datasets.

2021

pdf bib abs

S-NLP at SemEval-2021 Task 5: An Analysis of Dual Networks for Sequence Tagging
Viet Anh Nguyen | Tam Minh Nguyen | Huy Quang Dao | Quang Huu Pham
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

The SemEval 2021 task 5: Toxic Spans Detection is a task of identifying considered-toxic spans in text, which provides a valuable, automatic tool for moderating online contents. This paper represents the second-place method for the task, an ensemble of two approaches. While one approach relies on combining different embedding methods to extract diverse semantic and syntactic representations of words in context; the other utilizes extra data with a slightly customized Self-training, a semi-supervised learning technique, for sequence tagging problems. Both of our architectures take advantage of a strong language model, which was fine-tuned on a toxic classification task. Although experimental evidence indicates higher effectiveness of the first approach than the second one, combining them leads to our best results of 70.77 F1-score on the test dataset.

2020

pdf bib abs

SunBear at WNUT-2020 Task 2: Improving BERT-Based Noisy Text Classification with Knowledge of the Data domain
Linh Doan Bao | Viet Anh Nguyen | Quang Pham Huu
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

This paper proposes an improved custom model for WNUT task 2: Identification of Informative COVID-19 English Tweet. We improve experiment with the effectiveness of fine-tuning methodologies for state-of-the-art language model RoBERTa. We make a preliminary instantiation of this formal model for the text classification approaches. With appropriate training techniques, our model is able to achieve 0.9218 F1-score on public validation set and the ensemble version settles at top 9 F1-score (0.9005) and top 2 Recall (0.9301) on private test set.

Co-authors

Venues

Fix author