Alexandros Papangelis

Also published as: Alex Papangelis


2024

pdf bib
FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking
Zhuoer Wang | Leonardo F. R. Ribeiro | Alexandros Papangelis | Rohan Mukherjee | Tzu-Yen Wang | Xinyan Zhao | Arijit Biswas | James Caverlee | Angeliki Metallinou
Findings of the Association for Computational Linguistics: EMNLP 2024

API call generation is the cornerstone of large language models’ tool-using ability that provides access to the larger world. However, existing supervised and in-context learning approaches suffer from high training costs, poor data efficiency, and generated API calls that can be unfaithful to the API documentation and the user’s request. To address these limitations, we propose an output-side optimization approach called FANTASE. Two of the unique contributions of FANTASE are its State-Tracked Constrained Decoding (SCD) and Reranking components. SCD dynamically incorporates appropriate API constraints in the form of Token Search Trie for efficient and guaranteed generation faithfulness with respect to the API documentation. The Reranking component efficiently brings in the supervised signal by leveraging a lightweight model as the discriminator to rerank the beam-searched candidate generations of the large language model. We demonstrate the superior performance of FANTASE in API call generation accuracy, inference efficiency, and context efficiency with DSTC8 and API Bank datasets.

pdf bib
Semi-Supervised Reward Modeling via Iterative Self-Training
Yifei He | Haoxiang Wang | Ziyan Jiang | Alexandros Papangelis | Han Zhao
Findings of the Association for Computational Linguistics: EMNLP 2024

Reward models (RM) capture the values and preferences of humans and play a central role in Reinforcement Learning with Human Feedback (RLHF) to align pretrained large language models (LLMs). Traditionally, training these models relies on extensive human-annotated preference data, which poses significant challenges in terms of scalability and cost. To overcome these limitations, we propose Semi-Supervised Reward Modeling (SSRM), an approach that enhances RM training using unlabeled data. Given an unlabeled dataset, SSRM involves three key iterative steps: pseudo-labeling unlabeled examples, selecting high-confidence examples through a confidence threshold, and supervised finetuning on the refined dataset. Across extensive experiments on various model configurations, we demonstrate that SSRM significantly improves reward models without incurring additional labeling costs. Notably, SSRM can achieve performance comparable to models trained entirely on labeled data of equivalent volumes. Overall, SSRM substantially reduces the dependency on large volumes of human-annotated data, thereby decreasing the overall cost and time involved in training effective reward models.

pdf bib
Proceedings of the 6th Workshop on NLP for Conversational AI (NLP4ConvAI 2024)
Elnaz Nouri | Abhinav Rastogi | Georgios Spithourakis | Bing Liu | Yun-Nung Chen | Yu Li | Alon Albalak | Hiromi Wakaki | Alexandros Papangelis
Proceedings of the 6th Workshop on NLP for Conversational AI (NLP4ConvAI 2024)

2023

pdf bib
Task-Oriented Conversational Modeling with Subjective Knowledge Track in DSTC11
Seokhwan Kim | Spandana Gella | Chao Zhao | Di Jin | Alexandros Papangelis | Behnam Hedayatnia | Yang Liu | Dilek Z Hakkani-Tur
Proceedings of The Eleventh Dialog System Technology Challenge

Conventional Task-oriented Dialogue (TOD) Systems rely on domain-specific APIs/DBs or external factual knowledge to create responses. In DSTC11 track 5, we aims to provide a new challenging task to accommodate subjective user requests (e.g.,”Is the WIFI reliable?” or “Does the restaurant have a good atmosphere?” into TOD. We release a benchmark dataset, which contains subjective knowledge-seeking dialogue contexts and manually annotated responses that are grounded in subjective knowledge sources. The challenge track received a total of 48 entries from 14 participating teams.

pdf bib
Selective In-Context Data Augmentation for Intent Detection using Pointwise V-Information
Yen-Ting Lin | Alexandros Papangelis | Seokhwan Kim | Sungjin Lee | Devamanyu Hazarika | Mahdi Namazifar | Di Jin | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

This work focuses on in-context data augmentation for intent detection. Having found that augmentation via in-context prompting of large pre-trained language models (PLMs) alone does not improve performance, we introduce a novel approach based on PLMs and pointwise V-information (PVI), a metric that can measure the usefulness of a datapoint for training a model. Our method first fine-tunes a PLM on a small seed of training data and then synthesizes new datapoints - utterances that correspond to given intents. It then employs intent-aware filtering, based on PVI, to remove datapoints that are not helpful to the downstream intent classifier. Our method is thus able to leverage the expressive power of large language models to produce diverse training data. Empirical results demonstrate that our method can produce synthetic training data that achieve state-of-the-art performance on three challenging intent detection datasets under few-shot settings (1.28% absolute improvement in 5-shot and 1.18% absolute in 10-shot, on average) and perform on par with the state-of-the-art in full-shot settings (within 0.01% absolute, on average).

pdf bib
PLACES: Prompting Language Models for Social Conversation Synthesis
Maximillian Chen | Alexandros Papangelis | Chenyang Tao | Seokhwan Kim | Andy Rosenbaum | Yang Liu | Zhou Yu | Dilek Hakkani-Tur
Findings of the Association for Computational Linguistics: EACL 2023

Collecting high quality conversational data can be very expensive for most applications and infeasible for others due to privacy, ethical, or similar concerns. A promising direction to tackle this problem is to generate synthetic dialogues by prompting large language models. In this work, we use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting. We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations. This includes various dimensions of conversation quality with human evaluation directly on the synthesized conversations, and interactive human evaluation of chatbots fine-tuned on the synthetically generated dataset. We additionally demonstrate that this prompting approach is generalizable to multi-party conversations, providing potential to create new synthetic data for multi-party tasks. Our synthetic multi-party conversations were rated more favorably across all measured dimensions compared to conversation excerpts sampled from a human-collected multi-party dataset.

pdf bib
“What do others think?”: Task-Oriented Conversational Modeling with Subjective Knowledge
Chao Zhao | Spandana Gella | Seokhwan Kim | Di Jin | Devamanyu Hazarika | Alexandros Papangelis | Behnam Hedayatnia | Mahdi Namazifar | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Task-oriented Dialogue (TOD) Systems aim to build dialogue systems that assist users in accomplishing specific goals, such as booking a hotel or a restaurant. Traditional TODs rely on domain-specific APIs/DBs or external factual knowledge to generate responses, which cannot accommodate subjective user requests (e.g.,”Is the WIFI reliable?” or “Does the restaurant have a good atmosphere?”). To address this issue, we propose a novel task of subjective-knowledge-based TOD (SK-TOD). We also propose the first corresponding dataset, which contains subjective knowledge-seeking dialogue contexts and manually annotated responses grounded in subjective knowledge sources. When evaluated with existing TOD approaches, we find that this task poses new challenges such as aggregating diverse opinions from multiple knowledge snippets. We hope this task and dataset can promote further research on TOD and subjective content understanding. The code and the dataset are available at https://github.com/alexa/dstc11-track5.

2022

pdf bib
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Sebastian Gehrmann | Abhik Bhattacharjee | Abinaya Mahendiran | Alex Wang | Alexandros Papangelis | Aman Madaan | Angelina Mcmillan-major | Anna Shvets | Ashish Upadhyay | Bernd Bohnet | Bingsheng Yao | Bryan Wilie | Chandra Bhagavatula | Chaobin You | Craig Thomson | Cristina Garbacea | Dakuo Wang | Daniel Deutsch | Deyi Xiong | Di Jin | Dimitra Gkatzia | Dragomir Radev | Elizabeth Clark | Esin Durmus | Faisal Ladhak | Filip Ginter | Genta Indra Winata | Hendrik Strobelt | Hiroaki Hayashi | Jekaterina Novikova | Jenna Kanerva | Jenny Chim | Jiawei Zhou | Jordan Clive | Joshua Maynez | João Sedoc | Juraj Juraska | Kaustubh Dhole | Khyathi Raghavi Chandu | Laura Perez Beltrachini | Leonardo F . R. Ribeiro | Lewis Tunstall | Li Zhang | Mahim Pushkarna | Mathias Creutz | Michael White | Mihir Sanjay Kale | Moussa Kamal Eddine | Nico Daheim | Nishant Subramani | Ondrej Dusek | Paul Pu Liang | Pawan Sasanka Ammanamanchi | Qi Zhu | Ratish Puduppully | Reno Kriz | Rifat Shahriyar | Ronald Cardenas | Saad Mahamood | Salomey Osei | Samuel Cahyawijaya | Sanja Štajner | Sebastien Montella | Shailza Jolly | Simon Mille | Tahmid Hasan | Tianhao Shen | Tosin Adewumi | Vikas Raunak | Vipul Raheja | Vitaly Nikolaev | Vivian Tsai | Yacine Jernite | Ying Xu | Yisi Sang | Yixin Liu | Yufang Hou
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Evaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances. We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other’s work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark.

pdf bib
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation
Sarik Ghazarian | Behnam Hedayatnia | Alexandros Papangelis | Yang Liu | Dilek Hakkani-Tur
Findings of the Association for Computational Linguistics: ACL 2022

Accurate automatic evaluation metrics for open-domain dialogs are in high demand. Existing model-based metrics for system response evaluation are trained on human annotated data, which is cumbersome to collect. In this work, we propose to use information that can be automatically extracted from the next user utterance, such as its sentiment or whether the user explicitly ends the conversation, as a proxy to measure the quality of the previous system response. This allows us to train on a massive set of dialogs with weak supervision, without requiring manual system turn quality annotations. Experiments show that our model is comparable to models trained on human annotated data. Furthermore, our model generalizes across both spoken and written open-domain dialog corpora collected from real and paid users.

pdf bib
Proceedings of the 4th Workshop on NLP for Conversational AI
Bing Liu | Alexandros Papangelis | Stefan Ultes | Abhinav Rastogi | Yun-Nung Chen | Georgios Spithourakis | Elnaz Nouri | Weiyan Shi
Proceedings of the 4th Workshop on NLP for Conversational AI

pdf bib
Knowledge-Grounded Conversational Data Augmentation with Generative Conversational Networks
Yen Ting Lin | Alexandros Papangelis | Seokhwan Kim | Dilek Hakkani-Tur
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

While rich, open-domain textual data are generally available and may include interesting phenomena (humor, sarcasm, empathy, etc.) most are designed for language processing tasks, and are usually in a non-conversational format. In this work, we take a step towards automatically generating conversational data using Generative Conversational Networks, aiming to benefit from the breadth of available language and knowledge data, and train open domain social conversational agents. We evaluate our approach on conversations with and without knowledge on the Topical Chat dataset using automatic metrics and human evaluators. Our results show that for conversations without knowledge grounding, GCN can generalize from the seed data, producing novel conversations that are less relevant but more engaging and for knowledge-grounded conversations, it can produce more knowledge-focused, fluent, and engaging conversations. Specifically, we show that for open-domain conversations with 10% of seed data, our approach performs close to the baseline that uses 100% of the data, while for knowledge-grounded conversations, it achieves the same using only 1% of the data, on human ratings of engagingness, fluency, and relevance.

2021

pdf bib
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI
Alexandros Papangelis | Paweł Budzianowski | Bing Liu | Elnaz Nouri | Abhinav Rastogi | Yun-Nung Chen
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI

pdf bib
Generative Conversational Networks
Alexandros Papangelis | Karthik Gopalakrishnan | Aishwarya Padmakumar | Seokhwan Kim | Gokhan Tur | Dilek Hakkani-Tur
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Inspired by recent work in meta-learning and generative teaching networks, we propose a framework called Generative Conversational Networks, in which conversational agents learn to generate their own labelled training data (given some seed data) and then train themselves from that data to perform a given task. We use reinforcement learning to optimize the data generation process where the reward signal is the agent’s performance on the task. The task can be any language-related task, from intent detection to full task-oriented conversations. In this work, we show that our approach is able to generalise from seed data and performs well in limited data and limited computation settings, with significant gains for intent detection and slot tagging across multiple datasets: ATIS, TOD, SNIPS, and Restaurants8k. We show an average improvement of 35% in intent detection and 21% in slot tagging over a baseline model trained from the seed data. We also conduct an analysis of the novelty of the generated data and provide generated examples for intent detection, slot tagging, and non-goal oriented conversations.

2020

pdf bib
Controllable Text Generation with Focused Variation
Lei Shu | Alexandros Papangelis | Yi-Chia Wang | Gokhan Tur | Hu Xu | Zhaleh Feizollahi | Bing Liu | Piero Molino
Findings of the Association for Computational Linguistics: EMNLP 2020

This work introduces Focused-Variation Network (FVN), a novel model to control language generation. The main problems in previous controlled language generation models range from the difficulty of generating text according to the given attributes, to the lack of diversity of the generated texts. FVN addresses these issues by learning disjoint discrete latent spaces for each attribute inside codebooks, which allows for both controllability and diversity, while at the same time generating fluent text. We evaluate FVN on two text generation datasets with annotated content and style, and show state-of-the-art performance as assessed by automatic and human evaluations.

pdf bib
Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI
Tsung-Hsien Wen | Asli Celikyilmaz | Zhou Yu | Alexandros Papangelis | Mihail Eric | Anuj Kumar | Iñigo Casanueva | Rushin Shah
Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI

2019

pdf bib
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue
Satoshi Nakamura | Milica Gasic | Ingrid Zukerman | Gabriel Skantze | Mikio Nakano | Alexandros Papangelis | Stefan Ultes | Koichiro Yoshino
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

pdf bib
Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning
Alexandros Papangelis | Yi-Chia Wang | Piero Molino | Gokhan Tur
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Some of the major challenges in training conversational agents include the lack of large-scale data of real-world complexity, defining appropriate evaluation measures, and managing meaningful conversations across many topics over long periods of time. Moreover, most works tend to assume that the conversational agent’s environment is stationary, a somewhat strong assumption. To remove this assumption and overcome the lack of data, we take a step away from the traditional training pipeline and model the conversation as a stochastic collaborative game. Each agent (player) has a role (“assistant”, “tourist”, “eater”, etc.) and their own objectives, and can only interact via language they generate. Each agent, therefore, needs to learn to operate optimally in an environment with multiple sources of uncertainty (its own LU and LG, the other agent’s LU, Policy, and LG). In this work, we present the first complete attempt at concurrently training conversational agents that communicate only via self-generated language and show that they outperform supervised and deep learning baselines.

2018

pdf bib
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue
Kazunori Komatani | Diane Litman | Kai Yu | Alex Papangelis | Lawrence Cavedon | Mikio Nakano
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

pdf bib
Spoken Dialogue for Information Navigation
Alexandros Papangelis | Panagiotis Papadakos | Yannis Stylianou | Yannis Tzitzikas
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

Aiming to expand the current research paradigm for training conversational AI agents that can address real-world challenges, we take a step away from traditional slot-filling goal-oriented spoken dialogue systems (SDS) and model the dialogue in a way that allows users to be more expressive in describing their needs. The goal is to help users make informed decisions rather than being fed matching items. To this end, we describe the Linked-Data SDS (LD-SDS), a system that exploits semantic knowledge bases that connect to linked data, and supports complex constraints and preferences. We describe the required changes in language understanding and state tracking, and the need for mined features, and we report the promising results (in terms of semantic errors, effort, etc) of a preliminary evaluation after training two statistical dialogue managers in various conditions.

2016

pdf bib
Special Session - The Future Directions of Dialogue-Based Intelligent Personal Assistants
Yoichi Matsuyama | Alexandros Papangelis
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2015

pdf bib
Reinforcement Learning of Multi-Issue Negotiation Dialogue Policies
Alexandros Papangelis | Kallirroi Georgila
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2012

pdf bib
A Comparative Study of Reinforcement Learning Techniques on Dialogue Management
Alexandros Papangelis
Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Evaluation of Online Dialogue Policy Learning Techniques
Alexandros Papangelis | Vangelis Karkaletsis | Fillia Makedon
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The number of applied Dialogue Systems is ever increasing in several service providing and other applications as a way to efficiently and inexpensively serve large numbers of customers. A DS that employs some form of adaptation to the environment and its users is called an Adaptive Dialogue System (ADS). A significant part of the research community has lately focused on ADS and many existing or novel techniques are being applied to this problem. One of the most promising techniques is Reinforcement Learning (RL) and especially online RL. This paper focuses on online RL techniques used to achieve adaptation in Dialogue Management and provides an evaluation of various such methods in an effort to aid the designers of ADS in deciding which method to use. To the best of our knowledge there is no other work to compare online RL techniques on the dialogue management problem.
Search
Co-authors
Fix data