The More the Better? Assessing the Influence of Wikipedia’s Growth on Semantic Relatedness Measures

Torsten Zesch, Iryna Gurevych


Abstract
Wikipedia has been used as a knowledge source in many areas of natural language processing. As most studies only use a certain Wikipedia snapshot, the influence of Wikipedia’s massive growth on the results is largely unknown. For the first time, we perform an in-depth analysis of this influence using semantic relatedness as an example application that tests a wide range of Wikipedia’s properties. We find that the growth of Wikipedia has almost no effect on the correlation of semantic relatedness measures with human judgments, while the coverage steadily increases.
Anthology ID:
L10-1055
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/93_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Torsten Zesch and Iryna Gurevych. 2010. The More the Better? Assessing the Influence of Wikipedia’s Growth on Semantic Relatedness Measures. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
The More the Better? Assessing the Influence of Wikipedia’s Growth on Semantic Relatedness Measures (Zesch & Gurevych, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/93_Paper.pdf