Evaluating automatic cross-domain Dutch semantic role annotation

Orphée De Clercq, Veronique Hoste, Paola Monachesi


Abstract
In this paper we present the first corpus where one million Dutch words from a variety of text genres have been annotated with semantic roles. 500K have been completely manually verified and used as training material to automatically label another 500K. All data has been annotated following an adapted version of the PropBank guidelines. The corpus's rich text type diversity and the availability of manually verified syntactic dependency structures allowed us to experiment with an existing semantic role labeler for Dutch. In order to test the system's portability across various domains, we experimented with training on individual domains and compared this with training on multiple domains by adding more data. Our results show that training on large data sets is necessary but that including genre-specific training material is also crucial to optimize classification. We observed that a small amount of in-domain training data is already sufficient to improve our semantic role labeler.
Anthology ID:
L12-1396
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
88–93
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/680_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Orphée De Clercq, Veronique Hoste, and Paola Monachesi. 2012. Evaluating automatic cross-domain Dutch semantic role annotation. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 88–93, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Evaluating automatic cross-domain Dutch semantic role annotation (De Clercq et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/680_Paper.pdf