2023
pdf
bib
abs
Unravelling Indirect Answers to Wh-Questions: Corpus Construction, Analysis, and Generation
Zulipiye Yusupujiang
|
Jonathan Ginzburg
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Indirect answers, crucial in human communication, serve to maintain politeness, avoid conflicts, and align with social customs. Although there has been a substantial number of studies on recognizing and understanding indirect answers to polar questions (often known as yes/no questions), there is a dearth of such work regarding wh-questions. This study takes up the challenge by constructing what is, to our knowledge, the first corpus of indirect answers to wh-questions. We analyze and interpret indirect answers to different wh-questions based on our carefully compiled corpus. In addition, we conducted a pilot study on generating indirect answers to wh-questions by fine-tuning the pre-trained generative language model DialoGPT (Zhang et al., 2020). Our results suggest this is a task that GPT finds difficult.
2022
pdf
bib
abs
UgChDial: A Uyghur Chat-based Dialogue Corpus for Response Space Classification
Zulipiye Yusupujiang
|
Jonathan Ginzburg
Proceedings of the Thirteenth Language Resources and Evaluation Conference
In this paper, we introduce a carefully designed and collected language resource: UgChDial – a Uyghur dialogue corpus based on a chatroom environment. The Uyghur Chat-based Dialogue Corpus (UgChDial) is divided into two parts: (1). Two-party dialogues and (2). Multi-party dialogues. We ran a series of 25, 120-minutes each, two-party chat sessions, totaling 7323 turns and 1581 question-response pairs. We created 16 different scenarios and topics to gather these two-party conversations. The multi-party conversations were compiled from chitchats in general channels as well as free chats in topic-oriented public channels, yielding 5588 unique turns and 838 question-response pairs. The initial purpose of this corpus is to study query-response pairs in Uyghur, building on an existing fine-grained response space taxonomy for English. We provide here initial annotation results on the Uyghur response space classification task using UgChDial.
2020
pdf
bib
abs
Designing a GWAP for Collecting Naturally Produced Dialogues for Low Resourced Languages
Zulipiye Yusupujiang
|
Jonathan Ginzburg
Workshop on Games and Natural Language Processing
In this paper we present a new method for collecting naturally generated dialogue data for a low resourced language, (specifically here—Uyghur). We plan to build a games with a purpose (GWAPs) to encourage native speakers to actively contribute dialogue data to our research project. Since we aim to characterize the response space of queries in Uyghur, we design various scenarios for conversations that yield to questions being posed and responded to. We will implement the GWAP with the RPG Maker MV Game Engine, and will integrate the chatroom system in the game with the Dialogue Experimental Toolkit (DiET). DiET will help us improve the data collection process, and most importantly, make us have some control over the interactions among the participants.
2019
pdf
bib
abs
Characterizing the Response Space of Questions: a Corpus Study for English and Polish
Jonathan Ginzburg
|
Zulipiye Yusupujiang
|
Chuyuan Li
|
Kexin Ren
|
Paweł Łupkowski
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue
The main aim of this paper is to provide a characterization of the response space for questions using a taxonomy grounded in a dialogical formal semantics. As a starting point we take the typology for responses in the form of questions provided in (Lupkowski and Ginzburg, 2016). This work develops a wide coverage taxonomy for question/question sequences observable in corpora including the BNC, CHILDES, and BEE, as well as formal modelling of all the postulated classes. Our aim is to extend this work to cover all responses to questions. We present the extended typology of responses to questions based on a corpus studies of BNC, BEE and Maptask with include 506, 262, and 467 question/response pairs respectively. We compare the data for English with data from Polish using the Spokes corpus (205 question/response pairs). We discuss annotation reliability and disagreement analysis. We sketch how each class can be formalized using a dialogical semantics appropriate for dialogue management.