Creating a Data Collection for Evaluating Rich Speech Retrieval

Maria Eskevich; Gareth J. F. Jones; Martha Larson; Roeland Ordelman

Creating a Data Collection for Evaluating Rich Speech Retrieval

Maria Eskevich, Gareth J.F. Jones, Martha Larson, Roeland Ordelman

Abstract

We describe the development of a test collection for the investigation of speech retrieval beyond identification of relevant content. This collection focuses on satisfying user information needs for queries associated with specific types of speech acts. The collection is based on an archive of the Internet video from Internet video sharing platform (blip.tv), and was provided by the MediaEval benchmarking initiative. A crowdsourcing approach was used to identify segments in the video data which contain speech acts, to create a description of the video containing the act and to generate search queries designed to refind this speech act. We describe and reflect on our experiences with crowdsourcing this test collection using the Amazon Mechanical Turk platform. We highlight the challenges of constructing this dataset, including the selection of the data source, design of the crowdsouring task and the specification of queries and relevant items.

Anthology ID:: L12-1544
Volume:: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:: May
Year:: 2012
Address:: Istanbul, Turkey
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 1736–1743
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2012/pdf/910_Paper.pdf
DOI:
Bibkey:
Cite (ACL):: Maria Eskevich, Gareth J.F. Jones, Martha Larson, and Roeland Ordelman. 2012. Creating a Data Collection for Evaluating Rich Speech Retrieval. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1736–1743, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):: Creating a Data Collection for Evaluating Rich Speech Retrieval (Eskevich et al., LREC 2012)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2012/pdf/910_Paper.pdf

PDF Cite Search Fix data