Building and Exploiting a Corpus of Dialog Interactions between French Speaking Virtual and Human Agents

Lina M. Rojas-Barahona, Alejandra Lorenzo, Claire Gardent


Abstract
We describe the acquisition of a dialog corpus for French based on multi-task human-machine interactions in a serious game setting. We present a tool for data collection that is configurable for multiple games; describe the data collected using this tool and the annotation schema used to annotate it; and report on the results obtained when training a classifier on the annotated data to associate each player turn with a dialog move usable by a rule based dialog manager. The collected data consists of approximately 1250 dialogs, 10454 utterances and 168509 words and will be made freely available to academic and nonprofit research.
Anthology ID:
L12-1275
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1428–1435
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/505_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Lina M. Rojas-Barahona, Alejandra Lorenzo, and Claire Gardent. 2012. Building and Exploiting a Corpus of Dialog Interactions between French Speaking Virtual and Human Agents. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1428–1435, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Building and Exploiting a Corpus of Dialog Interactions between French Speaking Virtual and Human Agents (Rojas-Barahona et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/505_Paper.pdf