Gokhan Tur - ACL Anthology

Gokhan Tur

Also published as: Gokhan Tür, G. Tur, Gökhan Tür

2026

SpeakRL: Synergizing Reasoning, Speaking, and Acting in Language Models with Reinforcement Learning
Emre Can Acikgoz | Jinoh Oh | Jie Hao | Joo Hyuk Jeon | Heng Ji | Dilek Hakkani-Tur | Gokhan Tur | Xiang Li | Chengyuan Ma | Xing Fan
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology

Effective human-agent collaboration is increasingly prevalent in real-world applications. Current trends in such collaborations are predominantly unidirectional, with users providing instructions or posing questions to agents, where agents respond directly without seeking necessary clarifications or confirmations. However, the evolving capabilities of these agents require more proactive engagement, where agents should dynamically participate in conversations to clarify user intents, resolve ambiguities, and adapt to changing circumstances. Existing prior work under-utilize the conversational capabilities of language models (LMs), thereby optimizing agents as better followers rather than effective speakers. In this work, we introduce SpeakRL, a reinforcement learning (RL) method that enhances agents’ conversational capabilities by rewarding proactive interactions with users, such as asking right clarification questions when necessary. To support this, we curate SpeakER, a synthetic dataset that includes diverse scenarios from task-oriented dialogues, where tasks are resolved through interactive clarification questions. We present a systematic analysis of reward design for conversational proactivity and propose a principled reward formulation for teaching agents to balance asking with acting. Empirical evaluations demonstrate that our approach achieves a 20.14% absolute improvement in task completion over base models without increasing conversation turns even surpassing even much larger proprietary models, demonstrating the promise of clarification-centric user-agent interactions.

MAC: A Multi-Agent Framework for Interactive User Clarification in Multi-turn Conversations
Emre Can Acikgoz | Jinoh Oh | Joo Hyuk Jeon | Jie Hao | Heng Ji | Dilek Hakkani-Tur | Gokhan Tur | Xiang Li | Chengyuan Ma | Xing Fan
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology

Conversational agents often encounter ambiguous user requests, requiring an effective clarification to successfully complete tasks. While recent advancements in real-world applications favor multi-agent architectures to manage complex conversational scenarios efficiently, ambiguity resolution remains a critical and underexplored challenge—particularly due to the difficulty of determining which agent should initiate a clarification and how agents should coordinate their actions when faced with uncertain or incomplete user input. The fundamental questions of when to interrupt a user and how to formulate the optimal clarification query within the most optimal multi-agent settings remain open. In this paper, we propose MAC (Multi-Agent Clarification), an interactive multi-agent framework specifically optimized to resolve user ambiguities by strategically managing clarification dialogues. We first introduce a novel taxonomy categorizing user ambiguities to systematically guide clarification strategies. Then, we present MAC that autonomously coordinates multiple agents to interact synergistically with users. Empirical evaluations on MultiWOZ 2.4 demonstrate that enabling clarification at both levels increases task success rate 7.8% (54.5 → 62.3) and reduces the average number of dialogue turns (6.53 → 4.86) by eliciting all required user information up front and minimizing repetition. Our findings highlight the importance of active user interaction and role-aware clarification for more reliable human–agent communication.

TT-SI: Self-Improving LLM Agents with Test-Time Training
Emre Can Acikgoz | Cheng Qian | Heng Ji | Dilek Hakkani-Tür | Gokhan Tur
Findings of the Association for Computational Linguistics: ACL 2026

One paradigm of language model (LM) fine-tuning relies on creating large training datasets, under the assumption that high quantity and diversity will enable models to generalize to novel tasks after post-training. In practice, gathering large sets of data is inefficient, and training on them is prohibitively expensive; worse, there is no guarantee that the resulting model will handle complex scenarios or generalize better. Moreover, existing techniques rarely assess whether a training sample provides novel information, resulting in unnecessary costs. In this work, we explore a new Test-Time Self-Improvement (TT-SI) algorithm to create more effective and generalizable agentic LMs on-the-fly. TT-SI can be summarized in three steps: (i) first it identifies the samples that model struggles with (self-awareness), (ii) then generates similar examples from detected uncertain samples (self-data augmentation), and (iii) uses these newly generated samples at test-time training (self-improvement). We further explore Test-Time Distillation (TT-D), which leverages a stronger supervisor for targeted data generation. Empirical evaluations across different agent benchmarks demonstrate that TT-SI improves the performance with +5.48% absolute accuracy gain on average across all benchmarks and surpasses other standard learning methods more efficiently. Our findings highlight the promise of TT-SI, demonstrating the potential of self-improvement algorithms at test-time as a new paradigm for building more capable agents toward self-evolution.

Prior Beliefs Prejudice LLM-as-Judge: Evidence from Persuasion Evaluation
Pardis Sadat Zahraei | Xiaoning Wang | Nimet Beyza Bozdag | Gokhan Tur | Dilek Hakkani-Tür
Findings of the Association for Computational Linguistics: ACL 2026

Large Language Models (LLMs) are increasingly used as judges to evaluate text quality, moderate content, and assess arguments. We investigate whether alignment-instilled prior beliefs bias LLM judgments, using persuasion evaluation as a representative task. We find a systematic failure: models conflate their trained beliefs with rhetorical quality, rating identical claims differently based on belief alignment rather than argumentative merit. A bare assertion aligned with training receives higher scores than a well-crafted counter-argument, even when explicitly instructed to judge rhetoric alone. We introduce ConvinceQA, a dataset of 27,756 persuasive arguments with controlled stance variation across subjective, harmful, and misinformation domains, and demonstrate this prior prejudice across models. We exploit this failure through persuasion-based probing: evaluating minimal pairs that differ only in the subject token bypasses learned refusals and reveals hidden biases. Analysis identifies three failure modes, with belief-conditioned rating inflation accounting for 88% of cases. Cross-task validation on essay quality assessment and debate judging confirms this is a pervasive limitation.

Current Agents Fail to Leverage World Model as Tool for Foresight
Cheng Qian | Emre Can Acikgoz | Bingxuan Li | Xiusi Chen | Yuji Zhang | Bingxiang He | Qinyu Luo | Gokhan Tur | Dilek Hakkani-Tür | Yunzhu Li | Heng Ji
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Agents built on vision-language models increasingly face tasks that demand anticipating future states rather than relying on short-horizon reasoning. Generative world models offer a promising remedy: agents could use them as external simulators to foresee outcomes before acting. This paper empirically examines whether current agents can leverage such world models as tools to enhance their cognition. Across diverse agentic and visual question answering tasks, we observe that some agents rarely invoke simulation (fewer than 1%), frequently misuse predicted rollouts (approximately 15%), and often exhibit inconsistent or even degraded performance (up to 5%) when simulation is available or enforced. Attribution analysis further indicates that the primary bottleneck lies in the agents’ capacity to decide when to simulate, how to interpret predicted outcomes, and how to integrate foresight into downstream reasoning. These findings underscore the need for mechanisms that foster calibrated, strategic interaction with world models, paving the way toward more reliable anticipatory cognition in future agent systems.

2025

TD-EVAL: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons
Emre Can Acikgoz | Carl Guo | Suvodip Dey | Akul Datta | Takyoung Kim | Gokhan Tur | Dilek Hakkani-Tur
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Task-oriented dialogue (TOD) systems are experiencing a revolution driven by Large Language Models (LLMs), yet the evaluation methodologies for these systems remain insufficient for their growing sophistication. While traditional automatic metrics effectively assessed earlier modular systems, they focus solely on the dialogue level and cannot detect critical intermediate errors that can arise during user-agent interactions. In this paper, we introduce **TD-EVAL** (**T**urn and **D**ialogue-level **Eval**uation), a two-step evaluation framework that unifies fine-grained turn-level analysis with holistic dialogue-level comparisons. At turn-level, we assess each response along three TOD-specific dimensions: *conversation cohesion*, *backend knowledge consistency*, and *policy compliance*. Meanwhile, we design **TOD Agent Arena** that uses pairwise comparisons to provide a measure of dialogue-level quality. Through experiments on MultiWOZ 2.4 and Tau-Bench, we demonstrate that TD-EVAL effectively identifies the conversational errors that conventional metrics miss. Furthermore, TD-EVAL exhibits better alignment with human judgments than traditional and LLM-based metrics. These findings demonstrate that TD-EVAL introduces a new paradigm for TOD system evaluation, efficiently assessing both turn and system levels with an easily reproducible framework for future research.

ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents
Vardhan Dongre | Xiaocheng Yang | Emre Can Acikgoz | Suvodip Dey | Gokhan Tur | Dilek Hakkani-Tur
Proceedings of the 15th International Workshop on Spoken Dialogue Systems Technology

Large language model (LLM)-based agents have been increasingly used to interact with external environments (e.g., games, APIs, etc.) and solve tasks. However, current frameworks do not enable these agents to work with users and interact with them to align on the details of their tasks and reach user-defined goals; instead, in ambiguous situations, these agents may make decisions based on assumptions. This work introduces ReSpAct (Reason, Speak, and Act), a novel framework that synergistically combines the essential skills for building task-oriented “conversational” agents. ReSpAct addresses this need for agents, expanding on the ReAct approach. ReSpAct framework enables agents to interpret user instructions, reason about complex tasks, execute appropriate actions and engage in dynamic dialogue to seek guidance, clarify ambiguities, understand user preferences, resolve problems, and use the intermediate feedback and responses of users to update their plans. We evaluated ReSpAct with GPT-4 in environments supporting user interaction, such as task-oriented dialogue (MultiWOZ) and interactive decision-making (Alfworld, WebShop), ReSpAct is flexible enough to incorporate dynamic user feedback and addresses prevalent issues like error propagation and agents getting stuck in reasoning loops. This results in more interpretable, human-like task-solving trajectories than baselines relying solely on reasoning traces. In two interactive decision-making benchmarks, AlfWorld and WebShop, ReSpAct outperforms strong reasoning-only method ReAct by an absolute success rate of 6% and 4%, respectively. In the task-oriented dialogue benchmark MultiWOZ, ReSpAct improved Inform and Success scores by 5.5% and 3%, respectively.

SMART: Self-Aware Agent for Tool Overuse Mitigation
Cheng Qian | Emre Can Acikgoz | Hongru Wang | Xiusi Chen | Avirup Sil | Dilek Hakkani-Tür | Gokhan Tur | Heng Ji
Findings of the Association for Computational Linguistics: ACL 2025

Current Large Language Model (LLM) agents demonstrate strong reasoning and tool use capabilities, but often lack self-awareness, failing to balance these approaches effectively. This imbalance leads to **Tool Overuse**, where models unnecessarily rely on external tools for tasks solvable with parametric knowledge, increasing computational overhead. Inspired by human metacognition, we introduce **SMART** (Strategic Model-Aware Reasoning with Tools), a paradigm that enhances an agent’s self-awareness to optimize task handling and reduce tool overuse. To support this paradigm, we introduce **SMART-ER**, a dataset spanning three domains, where reasoning alternates between parametric knowledge and tool-dependent steps, with each step enriched by rationales explaining when tools are necessary. Through supervised training, we develop **SMARTAgent**, a family of models that dynamically balance parametric knowledge and tool use. Evaluations show that SMARTAgent reduces tool use by 24% while improving performance by over 37%, enabling 7B-scale models to match its 70B counterpart and GPT-4. Additionally, SMARTAgent generalizes to out-of-distribution test data like GSM8K and MINTQA, maintaining accuracy with just one-fifth the tool calls. These highlight the potential of strategic tool use to enhance reasoning, mitigate overuse, and bridge the gap between model size and performance, advancing intelligent and resource-efficient agent designs.

Can a Single Model Master Both Multi-turn Conversations and Tool Use? CoALM: A Unified Conversational Agentic Language Model
Emre Can Acikgoz | Jeremiah Greer | Akul Datta | Ze Yang | William Zeng | Oussama Elachqar | Emmanouil Koukoumidis | Dilek Hakkani-Tür | Gokhan Tur
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) with API-calling capabilities enabled building effective Language Agents (LA), while also revolutionizing the conventional task-oriented dialogue (TOD) paradigm. However, current approaches face a critical dilemma: TOD systems are often trained on a limited set of target APIs, requiring new data to maintain their quality when interfacing with new services, while LAs are not trained to maintain user intent over multi-turn conversations. Because both robust multi-turn management and advanced function calling are crucial for effective conversational agents, we evaluate these skills on three popular benchmarks: MultiWOZ 2.4 (TOD), BFCL V3 (LA), and API-Bank (LA)—and our analyses reveal that specialized approaches excel in one domain but underperform in the other. To bridge this chasm, we introduce CoALM (Conversational Agentic Language Model), a unified approach that integrates both conversational and agentic capabilities. We created CoALM-IT, a carefully constructed multi-task dataset that interleave multi-turn ReAct reasoning with complex API usage. Using CoALM-IT, we train three models CoALM 8B, CoALM 70B, and CoALM 405B, which outperform top domain-specific models, including GPT-4o, across all three benchmarks. This demonstrates the feasibility of a single model approach for both TOD and LA, setting a new standard for conversational agents.

Know Your Mistakes: Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling
Suvodip Dey | Yi-Jyun Sun | Gokhan Tur | Dilek Hakkani-Tür
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent LLMs have enabled significant advancements for conversational agents. However, they are also well known to hallucinate, producing responses that seem plausible but are factually incorrect. On the other hand, users tend to over-rely on LLM-based AI agents, accepting AI’s suggestion even when it is wrong. Adding positive friction, such as explanations or getting user confirmations, has been proposed as a mitigation in AI-supported decision-making systems. In this paper, we propose an accountability model for LLM-based task-oriented dialogue agents to address user overreliance via friction turns in cases of model uncertainty and errors associated with dialogue state tracking (DST). The accountability model is an augmented LLM with an additional accountability head that functions as a binary classifier to predict the relevant slots of the dialogue state mentioned in the conversation. We perform our experiments with multiple backbone LLMs on two established benchmarks (MultiWOZ and Snips). Our empirical findings demonstrate that the proposed approach not only enables reliable estimation of AI agent errors but also guides the decoder in generating more accurate actions. We observe around 3% absolute improvement in joint goal accuracy (JGA) of DST output by incorporating accountability heads into modern LLMs. Self-correcting the detected errors further increases the JGA from 67.13 to 70.51, achieving state-of-the-art DST performance. Finally, we show that error correction through user confirmations (friction turn) achieves a similar performance gain, highlighting its potential to reduce user overreliance.

2024

Dialog Flow Induction for Constrainable LLM-Based Chatbots
Stuti Agrawal | Pranav Pillai | Nishi Uppuluri | Revanth Gangi Reddy | Sha Li | Gokhan Tur | Dilek Hakkani-Tur | Heng Ji
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

LLM-driven dialog systems are used in a diverse set of applications, ranging from healthcare to customer service. However, given their generalization capability, it is difficult to ensure that these chatbots stay within the boundaries of the specialized domains, potentially resulting in inaccurate information and irrelevant responses. This paper introduces an unsupervised approach for automatically inducing domain-specific dialog flows that can be used to constrain LLM-based chatbots. We introduce two variants of dialog flow based on the availability of in-domain conversation instances. Through human and automatic evaluation over 24 dialog domains, we demonstrate that our high-quality data-guided dialog flows achieve better domain coverage, thereby overcoming the need for extensive manual crafting of such flows.

SG-RAG: Multi-Hop Question Answering With Large Language Models Through Knowledge Graphs
Ahmmad O. M. Saleh | Gokhan Tur | Yucel Saygin
Proceedings of the 7th International Conference on Natural Language and Speech Processing (ICNLSP 2024)

2023

We present the MASSIVE dataset–Multilingual Amazon Slu resource package (SLURP) for Slot-filling, Intent classification, and Virtual assistant Evaluation. MASSIVE contains 1M realistic, parallel, labeled virtual assistant utterances spanning 51 languages, 18 domains, 60 intents, and 55 slots. MASSIVE was created by tasking professional translators to localize the English-only SLURP dataset into 50 typologically diverse languages from 29 genera. We also present modeling results on XLM-R and mT5, including exact match accuracy, intent classification accuracy, and slot-filling F1 score. We have released our dataset, modeling code, and models publicly.

2022

VISITRON: Visual Semantics-Aligned Interactively Trained Object-Navigator
Ayush Shrivastava | Karthik Gopalakrishnan | Yang Liu | Robinson Piramuthu | Gokhan Tur | Devi Parikh | Dilek Hakkani-Tur
Findings of the Association for Computational Linguistics: ACL 2022

Interactive robots navigating photo-realistic environments need to be trained to effectively leverage and handle the dynamic nature of dialogue in addition to the challenges underlying vision-and-language navigation (VLN). In this paper, we present VISITRON, a multi-modal Transformer-based navigator better suited to the interactive regime inherent to Cooperative Vision-and-Dialog Navigation (CVDN). VISITRON is trained to: i) identify and associate object-level concepts and semantics between the environment and dialogue history, ii) identify when to interact vs. navigate via imitation learning of a binary classification head. We perform extensive pre-training and fine-tuning ablations with VISITRON to gain empirical insights and improve performance on CVDN. VISITRON’s ability to identify when to interact leads to a natural generalization of the game-play mode introduced by Roman et al. (2020) for enabling the use of such models in different environments. VISITRON is competitive with models on the static CVDN leaderboard and attains state-of-the-art performance on the Success weighted by Path Length (SPL) metric.

CGF: Constrained Generation Framework for Query Rewriting in Conversational AI
Jie Hao | Yang Liu | Xing Fan | Saurabh Gupta | Saleh Soltan | Rakesh Chada | Pradeep Natarajan | Chenlei Guo | Gokhan Tur
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

In conversational AI agents, Query Rewriting (QR) plays a crucial role in reducing user frictions and satisfying their daily demands. User frictions are caused by various reasons, such as errors in the conversational AI system, users’ accent or their abridged language. In this work, we present a novel Constrained Generation Framework (CGF) for query rewriting at both global and personalized levels. It is based on the encoder-decoder framework, where the encoder takes the query and its previous dialogue turns as the input to form a context-enhanced representation, and the decoder uses constrained decoding to generate the rewrites based on the pre-defined global or personalized constrained decoding space. Extensive offline and online A/B experiments show that the proposed CGF significantly boosts the query rewriting performance.

2021

Generative Conversational Networks
Alexandros Papangelis | Karthik Gopalakrishnan | Aishwarya Padmakumar | Seokhwan Kim | Gokhan Tur | Dilek Hakkani-Tur
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Inspired by recent work in meta-learning and generative teaching networks, we propose a framework called Generative Conversational Networks, in which conversational agents learn to generate their own labelled training data (given some seed data) and then train themselves from that data to perform a given task. We use reinforcement learning to optimize the data generation process where the reward signal is the agent’s performance on the task. The task can be any language-related task, from intent detection to full task-oriented conversations. In this work, we show that our approach is able to generalise from seed data and performs well in limited data and limited computation settings, with significant gains for intent detection and slot tagging across multiple datasets: ATIS, TOD, SNIPS, and Restaurants8k. We show an average improvement of 35% in intent detection and 21% in slot tagging over a baseline model trained from the seed data. We also conduct an analysis of the novelty of the generated data and provide generated examples for intent detection, slot tagging, and non-goal oriented conversations.

2020

Controllable Text Generation with Focused Variation
Lei Shu | Alexandros Papangelis | Yi-Chia Wang | Gokhan Tur | Hu Xu | Zhaleh Feizollahi | Bing Liu | Piero Molino
Findings of the Association for Computational Linguistics: EMNLP 2020

This work introduces Focused-Variation Network (FVN), a novel model to control language generation. The main problems in previous controlled language generation models range from the difficulty of generating text according to the given attributes, to the lack of diversity of the generated texts. FVN addresses these issues by learning disjoint discrete latent spaces for each attribute inside codebooks, which allows for both controllability and diversity, while at the same time generating fluent text. We evaluate FVN on two text generation datasets with annotated content and style, and show state-of-the-art performance as assessed by automatic and human evaluations.

2019

Flexibly-Structured Model for Task-Oriented Dialogues
Lei Shu | Piero Molino | Mahdi Namazifar | Hu Xu | Bing Liu | Huaixiu Zheng | Gokhan Tur
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

This paper proposes a novel end-to-end architecture for task-oriented dialogue systems. It is based on a simple and practical yet very effective sequence-to-sequence approach, where language understanding and state tracking tasks are modeled jointly with a structured copy-augmented sequential decoder and a multi-label decoder for each slot. The policy engine and language generation tasks are modeled jointly following that. The copy-augmented sequential decoder deals with new or unknown values in the conversation, while the multi-label decoder combined with the sequential decoder ensures the explicit assignment of values to slots. On the generation part, slot binary classifiers are used to improve performance. This architecture is scalable to real-world scenarios and is shown through an empirical evaluation to achieve state-of-the-art performance on both the Cambridge Restaurant dataset and the Stanford in-car assistant dataset.

Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning
Alexandros Papangelis | Yi-Chia Wang | Piero Molino | Gokhan Tur
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Some of the major challenges in training conversational agents include the lack of large-scale data of real-world complexity, defining appropriate evaluation measures, and managing meaningful conversations across many topics over long periods of time. Moreover, most works tend to assume that the conversational agent’s environment is stationary, a somewhat strong assumption. To remove this assumption and overcome the lack of data, we take a step away from the traditional training pipeline and model the conversation as a stochastic collaborative game. Each agent (player) has a role (“assistant”, “tourist”, “eater”, etc.) and their own objectives, and can only interact via language they generate. Each agent, therefore, needs to learn to operate optimally in an environment with multiple sources of uncertainty (its own LU and LG, the other agent’s LU, Policy, and LG). In this work, we present the first complete attempt at concurrently training conversational agents that communicate only via self-generated language and show that they outperform supervised and deep learning baselines.

2018

Bootstrapping a Neural Conversational Agent with Dialogue Self-Play, Crowdsourcing and On-Line Reinforcement Learning
Pararth Shah | Dilek Hakkani-Tür | Bing Liu | Gokhan Tür
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

End-to-end neural models show great promise towards building conversational agents that are trained from data and on-line experience using supervised and reinforcement learning. However, these models require a large corpus of dialogues to learn effectively. For goal-oriented dialogues, such datasets are expensive to collect and annotate, since each task involves a separate schema and database of entities. Further, the Wizard-of-Oz approach commonly used for dialogue collection does not provide sufficient coverage of salient dialogue flows, which is critical for guaranteeing an acceptable task completion rate in consumer-facing conversational agents. In this paper, we study a recently proposed approach for building an agent for arbitrary tasks by combining dialogue self-play and crowd-sourcing to generate fully-annotated dialogues with diverse and natural utterances. We discuss the advantages of this approach for industry applications of conversational agents, wherein an agent can be rapidly bootstrapped to deploy in front of users and further optimized via interactive learning from actual users of the system.

Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems
Bing Liu | Gokhan Tür | Dilek Hakkani-Tür | Pararth Shah | Larry Heck
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

In this work, we present a hybrid learning method for training task-oriented dialogue systems through online user interactions. Popular methods for learning task-oriented dialogues include applying reinforcement learning with user feedback on supervised pre-training models. Efficiency of such learning method may suffer from the mismatch of dialogue state distribution between offline training and online interactive learning stages. To address this challenge, we propose a hybrid imitation and reinforcement learning method, with which a dialogue agent can effectively learn from its interaction with users by learning from human teaching and feedback. We design a neural network based task-oriented dialogue agent that can be optimized end-to-end with the proposed learning method. Experimental results show that our end-to-end dialogue agent can learn effectively from the mistake it makes via imitation learning from user teaching. Applying reinforcement learning with user feedback after the imitation learning stage further improves the agent’s capability in successfully completing a task.

2017

Sequential Dialogue Context Modeling for Spoken Language Understanding
Ankur Bapna | Gokhan Tür | Dilek Hakkani-Tür | Larry Heck
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Spoken Language Understanding (SLU) is a key component of goal oriented dialogue systems that would parse user utterances into semantic frame representations. Traditionally SLU does not utilize the dialogue history beyond the previous system turn and contextual ambiguities are resolved by the downstream components. In this paper, we explore novel approaches for modeling dialogue context in a recurrent neural network (RNN) based language understanding system. We propose the Sequential Dialogue Encoder Network, that allows encoding context from the dialogue history in chronological order. We compare the performance of our proposed architecture with two context models, one that uses just the previous turn context and another that encodes dialogue context in a memory network, but loses the order of utterances in the dialogue history. Experiments with a multi-domain dialogue dataset demonstrate that the proposed architecture results in reduced semantic frame error rates.

2013

Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression
Asli Celikyilmaz | Dilek Hakkani-Tur | Gokhan Tur | Ruhi Sarikaya
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

Mining Search Query Logs for Spoken Language Understanding
Dilek Hakkani-Tür | Gokhan Tür | Asli Celikyilmaz
NAACL-HLT Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data (SDCTD 2012)

2010

LDA Based Similarity Modeling for Question Answering
Asli Celikyilmaz | Dilek Hakkani-Tur | Gokhan Tur
Proceedings of the NAACL HLT 2010 Workshop on Semantic Search

NAACL HLT 2010 Tutorial Abstracts
Jason Baldwin | Peter Clark | Gokhan Tur
NAACL HLT 2010 Tutorial Abstracts

2009

Who, What, When, Where, Why? Comparing Multiple Approaches to the Cross-Lingual 5W Task
Kristen Parton | Kathleen R. McKeown | Bob Coyne | Mona T. Diab | Ralph Grishman | Dilek Hakkani-Tür | Mary Harper | Heng Ji | Wei Yun Ma | Adam Meyers | Sara Stolbach | Ang Sun | Gokhan Tur | Wei Xu | Sibel Yaman
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

Anchored Speech Recognition for Question Answering
Sibel Yaman | Gokhan Tur | Dimitra Vergyri | Dilek Hakkani-Tur | Mary Harper | Wen Wang
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

2007

Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies
Fuliang Weng | Ye-Yi Wang | Gokhan Tur
Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies

2005

Using Semantic and Syntactic Graphs for Call Classification
Dilek Hakkani-Tür | Gokhan Tur | Ananlada Chotimongkol
Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing

2004

Bootstrapping Spoken Dialog Systems with Data Reuse
Guiseppe Di Fabbrizio | Gokhan Tur | Dilek Hakkani-Tür
Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL 2004

2001

Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation
G. Tur | D. Hakkani-Tur | A. Stolcke | E. Shriberg
Computational Linguistics, Volume 27, Number 1, March 2001

2000

Statistical Morphological Disambiguation for Agglutinative Languages
Dilek Z. Hakkani-Tür | Kemal Oflazer | Gökhan Tür
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

1998

Implementing Voting Constraints with Finite State Transducers
Kemal Oflazer | Gokhan Tur
Finite State Methods in Natural Language Processing

Tagging English by Path Voting Constraints
Gökhan Tür | Kemal Oflazer
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

Tagging English by Path Voting Constraints
Gokhan Tur | Kemal Oflazer
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

1997

Morphological Disambiguation by Voting Constraints
Kemal Oflazer | Gokhan Tur
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

1996

Combining Hand-crafted Rules and Unsupervised Learning in Constraint-based Morphological Disambiguation
Kemal Oflazer | Gokhan Tur
Conference on Empirical Methods in Natural Language Processing

Co-authors

Alexandros Papangelis 4

Asli Celikyilmaz 3

Karthik Gopalakrishnan 2

Joo Hyuk Jeon 2

Yang Liu (刘扬) 2

Stuti Agrawal 1

Jason Baldwin 1

Frederic Bechet 1

Raffaella Bernardi 1

Nimet Beyza Bozdag 1

Zoraida Callejas 1

Yun-Nung Chen 1

Ananlada Chotimongkol 1

Shammur Absar Chowdhury 1

Géraldine Damnati 1

Giuseppe "Pino" Di Fabbrizio 1

Guiseppe Di Fabbrizio 1

Vardhan Dongre 1

Luis Fernando D’Haro 1

Oussama Elachqar 1

Zhaleh Feizollahi 1

Jack Fitzgerald 1

Revanth Gangi Reddy 1

Jeremiah Greer 1

Ralph Grishman 1

Saurabh Gupta 1

Joakim Gustafson 1

Dilek Z. Hakkani-Tür 1

Christopher Hench 1

Michael Johnston 1

Vishesh Kakarala 1

Tatsuya Kawahara 1

Emmanouil Koukoumidis 1

Wouter Leeuwis 1

Kathleen McKeown 1

John Mendonça 1

Seyed Mahed Mousavi 1

Mahdi Namazifar 1

Pradeep Natarajan 1

Prem Natarajan 1

Aishwarya Padmakumar 1

Kristen Parton 1

Charith Peris 1

Pranav Pillai 1

Robinson Piramuthu 1

Swetha Ranganath 1

Giuseppe Riccardi 1

Ahmmad O. M. Saleh 1

Ruhi Sarikaya 1

Elizabeth Shriberg 1

Ayush Shrivastava 1

Sara Stolbach 1

Andreas Stolcke 1

M. Inés Torres 1

Nishi Uppuluri 1

Dimitra Vergyri 1

Wen Wang (王雯) 1

Xiaoning Wang 1

Xiaocheng Yang 1

Koichiro Yoshino 1

Pardis Sadat Zahraei 1

Huaixiu Zheng 1

Venues