Karthikeyan Natesan Ramamurthy
2024
Value Alignment from Unstructured Text
Inkit Padhi
|
Karthikeyan Natesan Ramamurthy
|
Prasanna Sattigeri
|
Manish Nagireddy
|
Pierre Dognin
|
Kush R. Varshney
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Aligning large language models (LLMs) to value systems has emerged as a significant area of research within the fields of AI and NLP. Currently, this alignment process relies on the availability of high-quality supervised and preference data, which can be both time-consuming and expensive to curate or annotate. In this paper, we introduce a systematic end-to-end methodology for aligning LLMs to the implicit and explicit values represented in unstructured text data. Our proposed approach leverages the use of scalable synthetic data generation techniques to effectively align the model to the values present in the unstructured data. Through two distinct use-cases, we demonstrate the efficiency of our methodology on the Mistral-7B-Instruct model. Our approach credibly aligns LLMs to the values embedded within documents, and shows improved performance against other approaches, as quantified through the use of automatic metrics and win rates.
Ranking Large Language Models without Ground Truth
Amit Dhurandhar
|
Rahul Nair
|
Moninder Singh
|
Elizabeth Daly
|
Karthikeyan Natesan Ramamurthy
Findings of the Association for Computational Linguistics: ACL 2024
Evaluation and ranking of large language models (LLMs) has become an important problem with the proliferation of these models and their impact. Evaluation methods either require human responses which are expensive to acquire or use pairs of LLMs to evaluate each other which can be unreliable. In this paper, we provide a novel perspective where, given a dataset of prompts (viz. questions, instructions, etc.) and a set of LLMs, we rank them without access to any ground truth or reference responses. Inspired by real life where both an expert and a knowledgeable person can identify a novice our main idea is to consider triplets of models, where each one of them evaluates the other two, correctly identifying the worst model in the triplet with high probability. We also analyze our idea and provide sufficient conditions for it to succeed. Applying this idea repeatedly we propose two methods to rank LLMs. In experiments on different generative tasks (summarization, multiple-choice, and dialog), our methods reliably recover true rankings without reference data. This points to a viable low-resource mechanism for practical use.
2022
Your fairness may vary: Pretrained language model fairness in toxic text classification
Ioana Baldini
|
Dennis Wei
|
Karthikeyan Natesan Ramamurthy
|
Moninder Singh
|
Mikhail Yurochkin
Findings of the Association for Computational Linguistics: ACL 2022
The popularity of pretrained language models in natural language processing systems calls for a careful evaluation of such models in down-stream tasks, which have a higher potential for societal impact. The evaluation of such systems usually focuses on accuracy measures. Our findings in this paper call for attention to be paid to fairness measures as well. Through the analysis of more than a dozen pretrained language models of varying sizes on two toxic text classification tasks (English), we demonstrate that focusing on accuracy measures alone can lead to models with wide variation in fairness characteristics. Specifically, we observe that fairness can vary even more than accuracy with increasing training data size and different random initializations. At the same time, we find that little of the fairness variation is explained by model size, despite claims in the literature. To improve model fairness without retraining, we show that two post-processing methods developed for structured, tabular data can be successfully applied to a range of pretrained language models. Warning: This paper contains samples of offensive text.
Search
Co-authors
- Moninder Singh 2
- Inkit Padhi 1
- Prasanna Sattigeri 1
- Manish Nagireddy 1
- Pierre Dognin 1
- show all...