2024
pdf
bib
abs
WPO: Enhancing RLHF with Weighted Preference Optimization
Wenxuan Zhou
|
Ravi Agrawal
|
Shujian Zhang
|
Sathish Reddy Indurthi
|
Sanqiang Zhao
|
Kaiqiang Song
|
Silei Xu
|
Chenguang Zhu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Reinforcement learning from human feedback (RLHF) is a promising solution to align large language models (LLMs) more closely with human values. Off-policy preference optimization, where the preference data is obtained from other models, is widely adopted due to its cost efficiency and scalability. However, off-policy preference optimization often suffers from a distributional gap between the policy used for data collection and the target policy, leading to suboptimal optimization. In this paper, we propose a novel strategy to mitigate this problem by simulating on-policy learning with off-policy preference data. Our Weighted Preference Optimization (WPO) method adapts off-policy data to resemble on-policy data more closely by reweighting preference pairs according to their probability under the current policy. This method not only addresses the distributional gap problem but also enhances the optimization process without incurring additional costs. We validate our method on instruction following benchmarks including Alpaca Eval 2 and MT-bench. WPO not only outperforms Direct Preference Optimization (DPO) by up to 5.6% on Alpaca Eval 2 but also establishes a remarkable length-controlled winning rate against GPT-4-turbo of 76.7% based on Gemma-2-9b-it. We release the code and models at https://github.com/wzhouad/WPO.
pdf
bib
abs
Improving Multilingual Instruction Finetuning via Linguistically Natural and Diverse Datasets
Sathish Reddy Indurthi
|
Wenxuan Zhou
|
Shamil Chollampatt
|
Ravi Agrawal
|
Kaiqiang Song
|
Lingxiao Zhao
|
Chenguang Zhu
Findings of the Association for Computational Linguistics: EMNLP 2024
Advancements in Large Language Models (LLMs) have significantly enhanced instruction-following capabilities. However, most Instruction Fine-Tuning (IFT) datasets are predominantly in English, limiting model performance in other languages. Traditional methods for creating multilingual IFT datasets—such as translating existing English IFT datasets or converting existing NLP datasets into IFT datasets by templating—struggle to capture linguistic nuances and ensure prompt (instruction) diversity. To address this issue, we propose a novel method for collecting multilingual IFT datasets that preserves linguistic naturalness and ensures prompt diversity. This approach leverages English-focused LLMs, monolingual corpora, and a scoring function to create high-quality, diversified IFT datasets in multiple languages. Experiments demonstrate that LLMs finetuned using these IFT datasets show notable improvements in both generative and discriminative tasks, indicating enhanced language comprehension by LLMs in non-English contexts. Specifically, on the multilingual summarization task, LLMs using our IFT dataset achieved 17.57% and 15.23% improvements over LLMs fine-tuned with translation-based and template-based datasets, respectively.
2023
pdf
bib
abs
CLAD-ST: Contrastive Learning with Adversarial Data for Robust Speech Translation
Sathish Indurthi
|
Shamil Chollampatt
|
Ravi Agrawal
|
Marco Turchi
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
The cascaded approach continues to be the most popular choice for speech translation (ST). This approach consists of an automatic speech recognition (ASR) model and a machine translation (MT) model that are used in a pipeline to translate speech in one language to text in another language. MT models are often trained on the well-formed text and therefore lack robustness while translating noisy ASR outputs in the cascaded approach, degrading the overall translation quality significantly. We address this robustness problem in downstream MT models by forcing the MT encoder to bring the representations of a noisy input closer to its clean version in the semantic space. This is achieved by introducing a contrastive learning method that leverages adversarial examples in the form of ASR outputs paired with their corresponding human transcripts to optimize the network parameters. In addition, a curriculum learning strategy is then used to stabilize the training by alternating the standard MT log-likelihood loss and the contrastive losses. Our approach achieves significant gains of up to 3 BLEU scores in English-German and English-French speech translation without hurting the translation quality on clean text.
pdf
bib
abs
Select, Prompt, Filter: Distilling Large Language Models for Summarizing Conversations
Minh-Quang Pham
|
Sathish Indurthi
|
Shamil Chollampatt
|
Marco Turchi
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) like ChatGPT can be expensive to train, deploy, and use for specific natural language generation tasks such as text summarization and for certain domains. A promising alternative is to fine-tune relatively smaller language models (LMs) on a particular task using high-quality, in-domain datasets. However, it can be prohibitively expensive to get such high-quality training data. This issue has been mitigated by generating weakly supervised data via knowledge distillation (KD) of LLMs. We propose a three-step approach to distill ChatGPT and fine-tune smaller LMs for summarizing forum conversations. More specifically, we design a method to selectively sample a large unannotated corpus of forum conversation using a semantic similarity metric. Then, we use the same metric to retrieve suitable prompts for ChatGPT from a small annotated validation set in the same domain. The generated dataset is then filtered to remove low-quality instances. Our proposed select-prompt-filter KD approach leads to significant improvements of up to 6.6 ROUGE-2 score by leveraging sufficient in-domain pseudo-labeled data over a standard KD approach given the same size of training data.
2022
pdf
bib
abs
Language Model Augmented Monotonic Attention for Simultaneous Translation
Sathish Reddy Indurthi
|
Mohd Abbas Zaidi
|
Beomseok Lee
|
Nikhil Kumar Lakumarapu
|
Sangha Kim
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
The state-of-the-art adaptive policies for Simultaneous Neural Machine Translation (SNMT) use monotonic attention to perform read/write decisions based on the partial source and target sequences. The lack of sufficient information might cause the monotonic attention to take poor read/write decisions, which in turn negatively affects the performance of the SNMT model. On the other hand, human translators make better read/write decisions since they can anticipate the immediate future words using linguistic information and domain knowledge. In this work, we propose a framework to aid monotonic attention with an external language model to improve its decisions. Experiments on MuST-C English-German and English-French speech-to-text translation tasks show the future information from the language model improves the state-of-the-art monotonic multi-head attention model further.
2020
pdf
bib
abs
End-to-End Simultaneous Translation System for IWSLT2020 Using Modality Agnostic Meta-Learning
Hou Jeung Han
|
Mohd Abbas Zaidi
|
Sathish Reddy Indurthi
|
Nikhil Kumar Lakumarapu
|
Beomseok Lee
|
Sangha Kim
Proceedings of the 17th International Conference on Spoken Language Translation
In this paper, we describe end-to-end simultaneous speech-to-text and text-to-text translation systems submitted to IWSLT2020 online translation challenge. The systems are built by adding wait-k and meta-learning approaches to the Transformer architecture. The systems are evaluated on different latency regimes. The simultaneous text-to-text translation achieved a BLEU score of 26.38 compared to the competition baseline score of 14.17 on the low latency regime (Average latency ≤ 3). The simultaneous speech-to-text system improves the BLEU score by 7.7 points over the competition baseline for the low latency regime (Average Latency ≤ 1000).
pdf
bib
abs
End-to-End Offline Speech Translation System for IWSLT 2020 using Modality Agnostic Meta-Learning
Nikhil Kumar Lakumarapu
|
Beomseok Lee
|
Sathish Reddy Indurthi
|
Hou Jeung Han
|
Mohd Abbas Zaidi
|
Sangha Kim
Proceedings of the 17th International Conference on Spoken Language Translation
In this paper, we describe the system submitted to the IWSLT 2020 Offline Speech Translation Task. We adopt the Transformer architecture coupled with the meta-learning approach to build our end-to-end Speech-to-Text Translation (ST) system. Our meta-learning approach tackles the data scarcity of the ST task by leveraging the data available from Automatic Speech Recognition (ASR) and Machine Translation (MT) tasks. The meta-learning approach combined with synthetic data augmentation techniques improves the model performance significantly and achieves BLEU scores of 24.58, 27.51, and 27.61 on IWSLT test 2015, MuST-C test, and Europarl-ST test sets respectively.
2019
pdf
bib
abs
Look Harder: A Neural Machine Translation Model with Hard Attention
Sathish Reddy Indurthi
|
Insoo Chung
|
Sangha Kim
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Soft-attention based Neural Machine Translation (NMT) models have achieved promising results on several translation tasks. These models attend all the words in the source sequence for each target token, which makes them ineffective for long sequence translation. In this work, we propose a hard-attention based NMT model which selects a subset of source tokens for each target token to effectively handle long sequence translation. Due to the discrete nature of the hard-attention mechanism, we design a reinforcement learning algorithm coupled with reward shaping strategy to efficiently train it. Experimental results show that the proposed model performs better on long sequences and thereby achieves significant BLEU score improvement on English-German (EN-DE) and English-French (ENFR) translation tasks compared to the soft attention based NMT.
2018
pdf
bib
abs
A Multi-Stage Memory Augmented Neural Network for Machine Reading Comprehension
Seunghak Yu
|
Sathish Reddy Indurthi
|
Seohyun Back
|
Haejun Lee
Proceedings of the Workshop on Machine Reading for Question Answering
Reading Comprehension (RC) of text is one of the fundamental tasks in natural language processing. In recent years, several end-to-end neural network models have been proposed to solve RC tasks. However, most of these models suffer in reasoning over long documents. In this work, we propose a novel Memory Augmented Machine Comprehension Network (MAMCN) to address long-range dependencies present in machine reading comprehension. We perform extensive experiments to evaluate proposed method with the renowned benchmark datasets such as SQuAD, QUASAR-T, and TriviaQA. We achieve the state of the art performance on both the document-level (QUASAR-T, TriviaQA) and paragraph-level (SQuAD) datasets compared to all the previously published approaches.
pdf
bib
abs
Cut to the Chase: A Context Zoom-in Network for Reading Comprehension
Sathish Reddy Indurthi
|
Seunghak Yu
|
Seohyun Back
|
Heriberto Cuayáhuitl
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
In recent years many deep neural networks have been proposed to solve Reading Comprehension (RC) tasks. Most of these models suffer from reasoning over long documents and do not trivially generalize to cases where the answer is not present as a span in a given document. We present a novel neural-based architecture that is capable of extracting relevant regions based on a given question-document pair and generating a well-formed answer. To show the effectiveness of our architecture, we conducted several experiments on the recently proposed and challenging RC dataset ‘NarrativeQA’. The proposed architecture outperforms state-of-the-art results by 12.62% (ROUGE-L) relative improvement.
pdf
bib
abs
MemoReader: Large-Scale Reading Comprehension through Neural Memory Controller
Seohyun Back
|
Seunghak Yu
|
Sathish Reddy Indurthi
|
Jihie Kim
|
Jaegul Choo
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Machine reading comprehension helps machines learn to utilize most of the human knowledge written in the form of text. Existing approaches made a significant progress comparable to human-level performance, but they are still limited in understanding, up to a few paragraphs, failing to properly comprehend lengthy document. In this paper, we propose a novel deep neural network architecture to handle a long-range dependency in RC tasks. In detail, our method has two novel aspects: (1) an advanced memory-augmented architecture and (2) an expanded gated recurrent unit with dense connections that mitigate potential information distortion occurring in the memory. Our proposed architecture is widely applicable to other models. We have performed extensive experiments with well-known benchmark datasets such as TriviaQA, QUASAR-T, and SQuAD. The experimental results demonstrate that the proposed method outperforms existing methods, especially for lengthy documents.
2017
pdf
bib
abs
Generating Natural Language Question-Answer Pairs from a Knowledge Graph Using a RNN Based Question Generation Model
Sathish Reddy
|
Dinesh Raghu
|
Mitesh M. Khapra
|
Sachindra Joshi
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers
In recent years, knowledge graphs such as Freebase that capture facts about entities and relationships between them have been used actively for answering factoid questions. In this paper, we explore the problem of automatically generating question answer pairs from a given knowledge graph. The generated question answer (QA) pairs can be used in several downstream applications. For example, they could be used for training better QA systems. To generate such QA pairs, we first extract a set of keywords from entities and relationships expressed in a triple stored in the knowledge graph. From each such set, we use a subset of keywords to generate a natural language question that has a unique answer. We treat this subset of keywords as a sequence and propose a sequence to sequence model using RNN to generate a natural language question from it. Our RNN based model generates QA pairs with an accuracy of 33.61 percent and performs 110.47 percent (relative) better than a state-of-the-art template based method for generating natural language question from keywords. We also do an extrinsic evaluation by using the generated QA pairs to train a QA system and observe that the F1-score of the QA system improves by 5.5 percent (relative) when using automatically generated QA pairs in addition to manually generated QA pairs available for training.
2015
pdf
bib
A statistical approach for Non-Sentential Utterance Resolution for Interactive QA System
Dinesh Raghu
|
Sathish Indurthi
|
Jitendra Ajmera
|
Sachindra Joshi
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue