Effort of Genre Variation and Prediction of System Performance

Dong Wang, Fei Xia


Abstract
Domain adaptation is an important task in order for NLP systems to work well in real applications. There has been extensive research on this topic. In this paper, we address two issues that are related to domain adaptation. The first question is how much genre variation will affect NLP systems' performance. We investigate the effect of genre variation on the performance of three NLP tools, namely, word segmenter, POS tagger, and parser. We choose the Chinese Penn Treebank (CTB) as our corpus. The second question is how one can estimate NLP systems' performance when gold standard on the test data does not exist. To answer the question, we extend the prediction model in (Ravi et al., 2008) to provide prediction for word segmentation and POS tagging as well. Our experiments show that the predicted scores are close to the real scores when tested on the CTB data.
Anthology ID:
L12-1625
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1993–2000
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1049_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Dong Wang and Fei Xia. 2012. Effort of Genre Variation and Prediction of System Performance. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1993–2000, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Effort of Genre Variation and Prediction of System Performance (Wang & Xia, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1049_Paper.pdf