José Lopes


pdf bib
CRWIZ: A Framework for Crowdsourcing Real-Time Wizard-of-Oz Dialogues
Francisco Javier Chiyah Garcia | José Lopes | Xingkun Liu | Helen Hastie
Proceedings of the Twelfth Language Resources and Evaluation Conference

Large corpora of task-based and open-domain conversational dialogues are hugely valuable in the field of data-driven dialogue systems. Crowdsourcing platforms, such as Amazon Mechanical Turk, have been an effective method for collecting such large amounts of data. However, difficulties arise when task-based dialogues require expert domain knowledge or rapid access to domain-relevant information, such as databases for tourism. This will become even more prevalent as dialogue systems become increasingly ambitious, expanding into tasks with high levels of complexity that require collaboration and forward planning, such as in our domain of emergency response. In this paper, we propose CRWIZ: a framework for collecting real-time Wizard of Oz dialogues through crowdsourcing for collaborative, complex tasks. This framework uses semi-guided dialogue to avoid interactions that breach procedures and processes only known to experts, while enabling the capture of a wide variety of interactions.


pdf bib
The Spot the Difference corpus: a multi-modal corpus of spontaneous task oriented spoken interactions
José Lopes | Nils Hemmingsson | Oliver Åstrand
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
FARMI: A FrAmework for Recording Multi-Modal Interactions
Patrik Jonell | Mattias Bystedt | Per Fallgren | Dimosthenis Kontogiorgos | José Lopes | Zofia Malisz | Samuel Mascarenhas | Catharine Oertel | Eran Raveh | Todd Shore
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


pdf bib
The SpeDial datasets: datasets for Spoken Dialogue Systems analytics
José Lopes | Arodami Chorianopoulou | Elisavet Palogiannidi | Helena Moniz | Alberto Abad | Katerina Louka | Elias Iosif | Alexandros Potamianos
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The SpeDial consortium is sharing two datasets that were used during the SpeDial project. By sharing them with the community we are providing a resource to reduce the duration of cycle of development of new Spoken Dialogue Systems (SDSs). The datasets include audios and several manual annotations, i.e., miscommunication, anger, satisfaction, repetition, gender and task success. The datasets were created with data from real users and cover two different languages: English and Greek. Detectors for miscommunication, anger and gender were trained for both systems. The detectors were particularly accurate in tasks where humans have high annotator agreement such as miscommunication and gender. As expected due to the subjectivity of the task, the anger detector had a less satisfactory performance. Nevertheless, we proved that the automatic detection of situations that can lead to problems in SDSs is possible and can be a promising direction to reduce the duration of SDS’s development cycle.


pdf bib
Bilingually motivated segmentation and generation of word translations using relatively small translation data sets
Kavitha Karimbi Mahesh | Luís Gomes | José Lopes
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters

pdf bib
Automatic Detection of Miscommunication in Spoken Dialogue Systems
Raveesh Meena | José Lopes | Gabriel Skantze | Joakim Gustafson
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue