2024
pdf
bib
abs
Towards Aligning Language Models with Textual Feedback
Saüc Abadal Lloret
|
Shehzaad Dhuliawala
|
Keerthiram Murugesan
|
Mrinmaya Sachan
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
We present ALT (ALignment with Textual feedback), an approach that aligns language models with user preferences expressed in text. We argue that text offers greater expressiveness, enabling users to provide richer feedback than simple comparative preferences and this richer feedback can lead to more efficient and effective alignment. ALT aligns the model by conditioning its generation on the textual feedback. Our method relies solely on language modeling techniques and requires minimal hyper-parameter tuning, though it still presents the main benefit of RL-based algorithms and can effectively learn from textual feedback. We explore the efficacy and efficiency of textual feedback across different tasks such as toxicity reduction, summarization, and dialog response. We find that ALT outperforms PPO for the task of toxicity reduction while being able to match its performance on summarization with only 20% of the samples. We also explore how ALT can be used with feedback provided by an existing LLM.
pdf
bib
Language Guided Exploration for RL Agents in Text Environments
Hitesh Golchha
|
Sahil Yerawar
|
Dhruvesh Patel
|
Soham Dan
|
Keerthiram Murugesan
Findings of the Association for Computational Linguistics: NAACL 2024
pdf
bib
abs
STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models
Shreyas Basavatia
|
Keerthiram Murugesan
|
Shivam Ratnakar
Findings of the Association for Computational Linguistics: ACL 2024
Interactive fiction games have emerged as an important application to improve the generalization capabilities of language-based reinforcement learning (RL) agents. Existing environments for interactive fiction games are domain-specific or time-consuming to generate and do not train the RL agents to master a specific set of skills. In this work, we introduce an interactive environment for self-supervised RL, STARLING, for text-based games that bootstraps the text-based RL agents with automatically generated games (based on the seed set of game ideas) to boost the performance and generalization capabilities to reach a goal of the target environment. These games let the agent hone their skills on a predefined set of tasks. We create and test an environment with 100 games, generated using this automated framework that uses large language models (GPT3) and an interactive fiction game engine (based on Inform7) to provide the user with the ability to generate more games under minimal human supervision. Experimental results based on both the human participants and baseline text-based RL agents reveal that current state-of-the-art text-based RL agents cannot use previously learned skills in new situations at the level humans can. These results enforce STARLING’s potential to serve as a sandbox environment for further research in self-supervised text-based RL.
pdf
bib
abs
EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning
Kinjal Basu
|
Keerthiram Murugesan
|
Subhajit Chaudhury
|
Murray Campbell
|
Kartik Talamadupula
|
Tim Klinger
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Text-based games (TBGs) have emerged as an important collection of NLP tasks, requiring reinforcement learning (RL) agents to combine natural language understanding with reasoning. A key challenge for agents attempting to solve such tasks is to generalize across multiple games and demonstrate good performance on both seen and unseen objects. Purely deep-RL-based approaches may perform well on seen objects; however, they fail to showcase the same performance on unseen objects. Commonsense-infused deep-RL agents may work better on unseen data; unfortunately, their policies are often not interpretable or easily transferable. To tackle these issues, in this paper, we present EXPLORER which is an exploration-guided reasoning agent for textual reinforcement learning. EXPLORER is neuro-symbolic in nature, as it relies on a neural module for exploration and a symbolic module for exploitation. It can also learn generalized symbolic policies and perform well over unseen data. Our experiments show that EXPLORER outperforms the baseline agents on Text-World cooking (TW-Cooking) and Text-World Commonsense (TWC) games.
2023
pdf
bib
abs
Learning Symbolic Rules over Abstract Meaning Representations for Textual Reinforcement Learning
Subhajit Chaudhury
|
Sarathkrishna Swaminathan
|
Daiki Kimura
|
Prithviraj Sen
|
Keerthiram Murugesan
|
Rosario Uceda-Sosa
|
Michiaki Tatsubori
|
Achille Fokoue
|
Pavan Kapanipathi
|
Asim Munawar
|
Alexander Gray
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Text-based reinforcement learning agents have predominantly been neural network-based models with embeddings-based representation, learning uninterpretable policies that often do not generalize well to unseen games. On the other hand, neuro-symbolic methods, specifically those that leverage an intermediate formal representation, are gaining significant attention in language understanding tasks. This is because of their advantages ranging from inherent interpretability, the lesser requirement of training data, and being generalizable in scenarios with unseen data. Therefore, in this paper, we propose a modular, NEuro-Symbolic Textual Agent (NESTA) that combines a generic semantic parser with a rule induction system to learn abstract interpretable rules as policies. Our experiments on established text-based game benchmarks show that the proposed NESTA method outperforms deep reinforcement learning-based techniques by achieving better generalization to unseen test games and learning from fewer training interactions.
pdf
bib
abs
MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types
Keerthiram Murugesan
|
Sarathkrishna Swaminathan
|
Soham Dan
|
Subhajit Chaudhury
|
Chulaka Gunasekara
|
Maxwell Crouse
|
Diwakar Mahajan
|
Ibrahim Abdelaziz
|
Achille Fokoue
|
Pavan Kapanipathi
|
Salim Roukos
|
Alexander Gray
Findings of the Association for Computational Linguistics: ACL 2023
With the growing interest in large language models, the need for evaluating the quality of machine text compared to reference (typically human-generated) text has become focal attention. Most recent works focus either on task-specific evaluation metrics or study the properties of machine-generated text captured by the existing metrics. In this work, we propose a new evaluation scheme to model human judgments in 7 NLP tasks, based on the fine-grained mismatches between a pair of texts. Inspired by the recent efforts in several NLP tasks for fine-grained evaluation, we introduce a set of 13 mismatch error types such as spatial/geographic errors, entity errors, etc, to guide the model for better prediction of human judgments. We propose a neural framework for evaluating machine texts that uses these mismatch error types as auxiliary tasks and re-purposes the existing single-number evaluation metrics as additional scalar features, in addition to textual features extracted from the machine and reference texts. Our experiments reveal key insights about the existing metrics via the mismatch errors. We show that the mismatch errors between the sentence pairs on the held-out datasets from 7 NLP tasks align well with the human evaluation.
2022
pdf
bib
abs
X-FACTOR: A Cross-metric Evaluation of Factual Correctness in Abstractive Summarization
Subhajit Chaudhury
|
Sarathkrishna Swaminathan
|
Chulaka Gunasekara
|
Maxwell Crouse
|
Srinivas Ravishankar
|
Daiki Kimura
|
Keerthiram Murugesan
|
Ramón Fernandez Astudillo
|
Tahira Naseem
|
Pavan Kapanipathi
|
Alexander Gray
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Abstractive summarization models often produce factually inconsistent summaries that are not supported by the original article. Recently, a number of fact-consistent evaluation techniques have been proposed to address this issue; however, a detailed analysis of how these metrics agree with one another has yet to be conducted. In this paper, we present X-FACTOR, a cross-evaluation of three high-performing fact-aware abstractive summarization methods. First, we show that summarization models are often fine-tuned on datasets that contain factually inconsistent summaries and propose a fact-aware filtering mechanism that improves the quality of training data and, consequently, the factuality of these models. Second, we propose a corrector module that can be used to improve the factual consistency of generated summaries. Third, we present a re-ranking technique that samples summary instances from the output distribution of a summarization model and re-ranks the sampled instances based on their factuality. Finally, we provide a detailed cross-metric agreement analysis that shows how tuning a model to output summaries based on a particular factuality metric influences factuality as determined by the other metrics. Our goal in this work is to facilitate research that improves the factuality and faithfulness of abstractive summarization models.
2021
pdf
bib
abs
Efficient Text-based Reinforcement Learning by Jointly Leveraging State and Commonsense Graph Representations
Keerthiram Murugesan
|
Mattia Atzeni
|
Pavan Kapanipathi
|
Kartik Talamadupula
|
Mrinmaya Sachan
|
Murray Campbell
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Text-based games (TBGs) have emerged as useful benchmarks for evaluating progress at the intersection of grounded language understanding and reinforcement learning (RL). Recent work has proposed the use of external knowledge to improve the efficiency of RL agents for TBGs. In this paper, we posit that to act efficiently in TBGs, an agent must be able to track the state of the game while retrieving and using relevant commonsense knowledge. Thus, we propose an agent for TBGs that induces a graph representation of the game state and jointly grounds it with a graph of commonsense knowledge from ConceptNet. This combination is achieved through bidirectional knowledge graph attention between the two symbolic representations. We show that agents that incorporate commonsense into the game state graph outperform baseline agents.