Ian Lane


2019

pdf bib
Learning Question-Guided Video Representation for Multi-Turn Video Question Answering
Guan-Lin Chao | Abhinav Rastogi | Semih Yavuz | Dilek Hakkani-Tur | Jindong Chen | Ian Lane
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Understanding and conversing about dynamic scenes is one of the key capabilities of AI agents that navigate the environment and convey useful information to humans. Video question answering is a specific scenario of such AI-human interaction where an agent generates a natural language response to a question regarding the video of a dynamic scene. Incorporating features from multiple modalities, which often provide supplementary information, is one of the challenging aspects of video question answering. Furthermore, a question often concerns only a small segment of the video, hence encoding the entire video sequence using a recurrent neural network is not computationally efficient. Our proposed question-guided video representation module efficiently generates the token-level video summary guided by each word in the question. The learned representations are then fused with the question to generate the answer. Through empirical evaluation on the Audio Visual Scene-aware Dialog (AVSD) dataset, our proposed models in single-turn and multi-turn question answering achieve state-of-the-art performance on several automatic natural language generation evaluation metrics.

2018

pdf bib
End-to-End Learning of Task-Oriented Dialogs
Bing Liu | Ian Lane
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

In this thesis proposal, we address the limitations of conventional pipeline design of task-oriented dialog systems and propose end-to-end learning solutions. We design neural network based dialog system that is able to robustly track dialog state, interface with knowledge bases, and incorporate structured query results into system responses to successfully complete task-oriented dialog. In learning such neural network based dialog systems, we propose hybrid offline training and online interactive learning methods. We introduce a multi-task learning method in pre-training the dialog agent in a supervised manner using task-oriented dialog corpora. The supervised training agent can further be improved via interacting with users and learning online from user demonstration and feedback with imitation and reinforcement learning. In addressing the sample efficiency issue with online policy learning, we further propose a method by combining the learning-from-user and learning-from-simulation approaches to improve the online interactive learning efficiency.

pdf bib
Adversarial Learning of Task-Oriented Neural Dialog Models
Bing Liu | Ian Lane
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

In this work, we propose an adversarial learning method for reward estimation in reinforcement learning (RL) based task-oriented dialog models. Most of the current RL based task-oriented dialog systems require the access to a reward signal from either user feedback or user ratings. Such user ratings, however, may not always be consistent or available in practice. Furthermore, online dialog policy learning with RL typically requires a large number of queries to users, suffering from sample efficiency problem. To address these challenges, we propose an adversarial learning method to learn dialog rewards directly from dialog samples. Such rewards are further used to optimize the dialog policy with policy gradient based RL. In the evaluation in a restaurant search domain, we show that the proposed adversarial dialog learning method achieves advanced dialog success rate comparing to strong baseline methods. We further discuss the covariate shift problem in online adversarial dialog learning and show how we can address that with partial access to user feedback.

2016

pdf bib
Joint Online Spoken Language Understanding and Language Modeling With Recurrent Neural Networks
Bing Liu | Ian Lane
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2014

pdf bib
Situated Language Understanding at 25 Miles per Hour
Teruhisa Misu | Antoine Raux | Rakesh Gupta | Ian Lane
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

2012

pdf bib
A Simulation-based Framework for Spoken Language Understanding and Action Selection in Situated Interaction
David Cohen | Ian Lane
NAACL-HLT Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data (SDCTD 2012)

pdf bib
HRItk: The Human-Robot Interaction ToolKit Rapid Development of Speech-Centric Interactive Systems in ROS
Ian Lane | Vinay Prasad | Gaurav Sinha | Arlette Umuhoza | Shangyu Luo | Akshay Chandrashekaran | Antoine Raux
NAACL-HLT Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data (SDCTD 2012)

pdf bib
Machine Translation with Binary Feedback: a Large-Margin Approach
Avneesh Saluja | Ian Lane | Ying Zhang
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

Viewing machine translation as a structured classification problem has provided a gateway for a host of structured prediction techniques to enter the field. In particular, large-margin structured prediction methods for discriminative training of feature weights, such as the structured perceptron or MIRA, have started to match or exceed the performance of existing methods such as MERT. One issue with structured problems in general is the difficulty in obtaining fully structured labels, e.g., in machine translation, obtaining reference translations or parallel sentence corpora for arbitrary language pairs. Another issue, more specific to the translation domain, is the difficulty in online training of machine translation systems, since existing methods often require bilingual knowledge to correct translation output online. We propose a solution to these two problems, by demonstrating a way to incorporate binary-labeled feedback (i.e., feedback on whether a translation hypothesis is a “good” or understandable one or not), a form of supervision that can be easily integrated in an online manner, into a machine translation framework. Experimental results show marked improvement by incorporating binary feedback on unseen test data, with gains exceeding 5.5 BLEU points.

2011

pdf bib
Unsupervised vocabulary selection for simultaneous lecture translation
Paul Maergner | Kevin Kilgour | Ian Lane | Alex Waibel
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers

In this work, we propose a novel method for vocabulary selection which enables simultaneous speech recognition systems for lectures to automatically adapt to the diverse topics that occur in educational and scientific lectures. Utilizing materials that are available before the lecture begins, such as lecture slides, our proposed framework iteratively searches for related documents on the World Wide Web and generates a lecture-specific vocabulary and language model based on the resulting documents. In this paper, we introduce a novel method for vocabulary selection where we rank vocabulary that occurs in the collected documents based on a relevance score which is calculated using a combination of word features. Vocabulary selection is a critical component for topic adaptation that has typically been overlooked in prior works. On the interACT German-English simultaneous lecture translation system our proposed approach significantly improved vocabulary coverage, reducing the out-of-vocabulary rate on average by 57.0% and up to 84.9%, compared to a lecture-independent baseline. Furthermore, our approach reduced the word error rate by up to 25.3% (on average 13.2% across all lectures), compared to a lectureindependent baseline.

pdf bib
Unsupervised Vocabulary Selection for Domain-Independent Simultaneous Lecture Translation
Paul Maergner | Ian Lane | Alex Waibel
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Context-aware Language Modeling for Conversational Speech Translation
Avneesh Saluja | Ian Lane | Ying Zhang
Proceedings of Machine Translation Summit XIII: Papers

2010

pdf bib
Real-time spoken language identification and recognition for speech-to-speech translation
Daniel Chung Yong Lim | Ian Lane | Alex Waibel
Proceedings of the 7th International Workshop on Spoken Language Translation: Papers

pdf bib
Tools for Collecting Speech Corpora via Mechanical-Turk
Ian Lane | Matthias Eck | Kay Rottmann | Alex Waibel
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

2009

pdf bib
Incremental Adaptation of Speech-to-Speech Translation
Nguyen Bach | Roger Hsiao | Matthias Eck | Paisarn Charoenpornsawat | Stephan Vogel | Tanja Schultz | Ian Lane | Alex Waibel | Alan Black
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

2007

pdf bib
A Log-Linear Block Transliteration Model based on Bi-Stream HMMs
Bing Zhao | Nguyen Bach | Ian Lane | Stephan Vogel
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Bilingual-LSA Based LM Adaptation for Spoken Language Translation
Yik-Cheung Tam | Ian Lane | Tanja Schultz
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Improving spoken language translation by automatic disfluency removal: evidence from conversational speech transcripts
Sharath Rao | Ian Lane | Tanja Schultz
Proceedings of Machine Translation Summit XI: Papers

pdf bib
The CMU-UKA statistical machine translation systems for IWSLT 2007
Ian Lane | Andreas Zollmann | Thuy Linh Nguyen | Nguyen Bach | Ashish Venugopal | Stephan Vogel | Kay Rottmann | Ying Zhang | Alex Waibel
Proceedings of the Fourth International Workshop on Spoken Language Translation

This paper describes the CMU-UKA statistical machine translation systems submitted to the IWSLT 2007 evaluation campaign. Systems were submitted for three language-pairs: Japanese→English, Chinese→English and Arabic→English. All systems were based on a common phrase-based SMT (statistical machine translation) framework but for each language-pair a specific research problem was tackled. For Japanese→English we focused on two problems: first, punctuation recovery, and second, how to incorporate topic-knowledge into the translation framework. Our Chinese→English submission focused on syntax-augmented SMT and for the Arabic→English task we focused on incorporating morphological-decomposition into the SMT framework. This research strategy enabled us to evaluate a wide variety of approaches which proved effective for the language pairs they were evaluated on.

2006

pdf bib
The UKA/CMU statistical machine translation system for IWSLT 2006
Matthias Eck | Ian Lane | Nguyen Bach | Sanjika Hewavitharana | Muntsin Kolss | Bing Zhao | Almut Silja Hildebrand | Stephan Vogel | Alex Waibel
Proceedings of the Third International Workshop on Spoken Language Translation: Evaluation Campaign