In this paper we present a new method for collecting naturally generated dialogue data for a low resourced language, (specifically here—Uyghur). We plan to build a games with a purpose (GWAPs) to encourage native speakers to actively contribute dialogue data to our research project. Since we aim to characterize the response space of queries in Uyghur, we design various scenarios for conversations that yield to questions being posed and responded to. We will implement the GWAP with the RPG Maker MV Game Engine, and will integrate the chatroom system in the game with the Dialogue Experimental Toolkit (DiET). DiET will help us improve the data collection process, and most importantly, make us have some control over the interactions among the participants.
Characterizing the Response Space of Questions: a Corpus Study for English and Polish
Jonathan Ginzburg | Zulipiye Yusupujiang | Chuyuan Li | Kexin Ren | Paweł Łupkowski
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue
The main aim of this paper is to provide a characterization of the response space for questions using a taxonomy grounded in a dialogical formal semantics. As a starting point we take the typology for responses in the form of questions provided in (Lupkowski and Ginzburg, 2016). This work develops a wide coverage taxonomy for question/question sequences observable in corpora including the BNC, CHILDES, and BEE, as well as formal modelling of all the postulated classes. Our aim is to extend this work to cover all responses to questions. We present the extended typology of responses to questions based on a corpus studies of BNC, BEE and Maptask with include 506, 262, and 467 question/response pairs respectively. We compare the data for English with data from Polish using the Spokes corpus (205 question/response pairs). We discuss annotation reliability and disagreement analysis. We sketch how each class can be formalized using a dialogical semantics appropriate for dialogue management.