A Corpus of Word-Aligned Asked and Anticipated Questions in a Virtual Patient Dialogue System

Ajda Gokcen; Evan Jaffe; Johnsey Erdmann; Michael White; Douglas Danforth

A Corpus of Word-Aligned Asked and Anticipated Questions in a Virtual Patient Dialogue System

Ajda Gokcen, Evan Jaffe, Johnsey Erdmann, Michael White, Douglas Danforth

Abstract

We present a corpus of virtual patient dialogues to which we have added manually annotated gold standard word alignments. Since each question asked by a medical student in the dialogues is mapped to a canonical, anticipated version of the question, the corpus implicitly defines a large set of paraphrase (and non-paraphrase) pairs. We also present a novel process for selecting the most useful data to annotate with word alignments and for ensuring consistent paraphrase status decisions. In support of this process, we have enhanced the earlier Edinburgh alignment tool (Cohn et al., 2008) and revised and extended the Edinburgh guidelines, in particular adding guidance intended to ensure that the word alignments are consistent with the overall paraphrase status decision. The finished corpus and the enhanced alignment tool are made freely available.

Anthology ID:: L16-1506
Volume:: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:: May
Year:: 2016
Address:: Portorož, Slovenia
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 3174–3179
Language:
URL:: https://aclanthology.org/L16-1506/
DOI:
Bibkey:
Cite (ACL):: Ajda Gokcen, Evan Jaffe, Johnsey Erdmann, Michael White, and Douglas Danforth. 2016. A Corpus of Word-Aligned Asked and Anticipated Questions in a Virtual Patient Dialogue System. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3174–3179, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):: A Corpus of Word-Aligned Asked and Anticipated Questions in a Virtual Patient Dialogue System (Gokcen et al., LREC 2016)
Copy Citation:
PDF:: https://aclanthology.org/L16-1506.pdf

PDF Cite Search Fix data