Evaluating Query Languages for a Corpus Processing System

Elena Frick, Carsten Schnober, Piotr Bański


Abstract
This paper documents a pilot study conducted as part of the development of a new corpus processing system at the Institut für Deutsche Sprache in Mannheim and in the context of the ISO TC37 SC4/WG6 activity on the suggested work item proposal “Corpus Query Lingua Franca”. We describe the first phase of our research: the initial formulation of functionality criteria for query language evaluation and the results of the application of these criteria to three representatives of corpus query languages, namely COSMAS II, Poliqarp, and ANNIS QL. In contrast to previous works on query language evaluation that compare a range of existing query languages against a small number of queries, our approach analyses only three query languages against criteria derived from a suite of 300 use cases that cover diverse aspects of linguistic research.
Anthology ID:
L12-1471
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2286–2294
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/800_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Elena Frick, Carsten Schnober, and Piotr Bański. 2012. Evaluating Query Languages for a Corpus Processing System. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2286–2294, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Evaluating Query Languages for a Corpus Processing System (Frick et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/800_Paper.pdf