Xuchen Yao


2015

2014

2013

2012

This article describes a new tool for extracting question-answer pairs from text articles, and reports three experiments which investigate how suitable this technique is for supplying knowledge to conversational characters. Experiment 1 demonstrates the feasibility of our method by creating characters for 14 distinct topics and evaluating them using hand-authored questions. Experiment 2 evaluates three of these characters using questions collected from naive participants, showing that the generated characters provide full or partial answers to about half of the questions asked. Experiment 3 adds automatically extracted knowledge to an existing, hand-authored character, demonstrating that augmented characters can answer questions about new topics but with some degradation of the ability to answer questions about topics that the original character was trained to answer. Overall, the results show that question generation is a promising method for creating or augmenting a question answering conversational character using an existing text.
This paper presents a question generation system based on the approach of semantic rewriting. The state-of-the-art deep linguistic parsing and generation tools are employed to convert (back and forth) between the natural language sentences and their meaning representations in the form of Minimal Recursion Semantics (MRS). By carefully operating on the semantic structures, we show a principled way of generating questions without ad-hoc manipulation of the syntactic structures. Based on the (partial) understanding of the sentence meaning, the system generates questions which are semantically grounded and purposeful. And with the support of deep linguistic grammars, the grammaticality of the generation results is warranted. Further, with a specialized ranking model, the linguistic realizations from the general purpose generation model are further refined for our the question generation task. The evaluation results from QGSTEC2010 show promising prospects of the proposed approach.

2011

2010

We perform a large-scale evaluation of multiple off-the-shelf speech recognizers across diverse domains for virtual human dialogue systems. Our evaluation is aimed at speech recognition consumers and potential consumers with limited experience with readily available recognizers. We focus on practical factors to determine what levels of performance can be expected from different available recognizers in various projects featuring different types of conversational utterances. Our results show that there is no single recognizer that outperforms all other recognizers in all domains. The performance of each recognizer may vary significantly depending on the domain, the size and perplexity of the corpus, the out-of-vocabulary rate, and whether acoustic and language model adaptation has been used or not. We expect that our evaluation will prove useful to other speech recognition consumers, especially in the dialogue community, and will shed some light on the key problem in spoken dialogue systems of selecting the most suitable available speech recognition system for a particular application, and what impact training will have.
The current study presents a conversion and unification of the Penn Discourse TreeBank 2.0 (PDTB) and the Penn TreeBank (PTB) under XML format. The main goal of the PDTB XML is to create a tool for efficient and broad querying of the syntax and discourse information simultaneously. The key stages of the project are developing proper cross-references between different data types and their representation in the modified TIGER-XML format, and then writing the required declarative languages (XML Schema). PTB XML is compatible with TIGER-XML format. The PDTB XML is developed as a unified format for the convenience of XQuery users; it integrates discourse relations and XML structures into one unified hierarchy and builds the cross references between the syntactic trees and the discourse relations. The syntactic and discourse elements are assigned with unique IDs in order to build cross-references between them. The converted corpus allows for a simultaneous search for syntactically specified discourse information based on the XQuery standard, which is illustrated with a simple example in the article.