A good space: Lexical predictors in word space evaluation

Christian Smith; Henrik Danielsson; Arne Jönsson

A good space: Lexical predictors in word space evaluation

Christian Smith, Henrik Danielsson, Arne Jönsson

Abstract

Vector space models benefit from using an outside corpus to train the model. It is, however, unclear what constitutes a good training corpus. We have investigated the effect on summary quality when using various language resources to train a vector space based extraction summarizer. This is done by evaluating the performance of the summarizer utilizing vector spaces built from corpora from different genres, partitioned from the Swedish SUC-corpus. The corpora are also characterized using a variety of lexical measures commonly used in readability studies. The performance of the summarizer is measured by comparing automatically produced summaries to human created gold standard summaries using the ROUGE F-score. Our results show that the genre of the training corpus does not have a significant effect on summary quality. However, evaluating the variance in the F-score between the genres based on lexical measures as independent variables in a linear regression model, shows that vector spaces created from texts with high syntactic complexity, high word variation, short sentences and few long words produce better summaries.

Anthology ID:: L12-1159
Volume:: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:: May
Year:: 2012
Address:: Istanbul, Turkey
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 2530–2535
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2012/pdf/335_Paper.pdf
DOI:
Bibkey:
Cite (ACL):: Christian Smith, Henrik Danielsson, and Arne Jönsson. 2012. A good space: Lexical predictors in word space evaluation. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2530–2535, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):: A good space: Lexical predictors in word space evaluation (Smith et al., LREC 2012)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2012/pdf/335_Paper.pdf

PDF Cite Search Fix data