Retrieving Annotated Corpora for Corpus Annotation

Kyôsuke Yoshida, Taiichi Hashimoto, Takenobu Tokunaga, Hozumi Tanaka


Abstract
This paper introduces a tool \Bonsai which supports human in annotating corpora with morphosyntactic information, and in retrieving syntactic structures stored in the database. Integrating annotation and retrieval enables users to annotate a new instance while looking back at the already annotated sentences which share the similar morphosyntactic structure. We focus on the retrieval part of the system, and describe a method to decompose a large input query into smaller ones in order to gain retrieval efficiency. The proposed method is evaluated with the Penn Treebank corpus, showing significant improvements.
Anthology ID:
L04-1233
Volume:
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
Month:
May
Year:
2004
Address:
Lisbon, Portugal
Editors:
Maria Teresa Lino, Maria Francisca Xavier, Fátima Ferreira, Rute Costa, Raquel Silva
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2004/pdf/403.pdf
DOI:
Bibkey:
Cite (ACL):
Kyôsuke Yoshida, Taiichi Hashimoto, Takenobu Tokunaga, and Hozumi Tanaka. 2004. Retrieving Annotated Corpora for Corpus Annotation. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), Lisbon, Portugal. European Language Resources Association (ELRA).
Cite (Informal):
Retrieving Annotated Corpora for Corpus Annotation (Yoshida et al., LREC 2004)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2004/pdf/403.pdf