Assessing the Comparability of News Texts

Emma Barker, Robert Gaizauskas


Abstract
Comparable news texts are frequently proposed as a potential source of alignable subsentential fragments for use in statistical machine translation systems. But can we assess just how potentially useful they will be? In this paper we first discuss a scheme for classifying news text pairs according to the degree of relatedness of the events they report and investigate how robust this classification scheme is via a multi-lingual annotation exercise. We then propose an annotation methodology, similar to that used in summarization evaluation, to allow us to identify and quantify shared content at the subsentential level in news text pairs and report a preliminary exercise to assess this method. We conclude by discussing how this works fits into a broader programme of assessing the potential utility of comparable news texts for extracting paraphrases/translational equivalents for use in language processing applications.
Anthology ID:
L12-1637
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3996–4003
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1069_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Emma Barker and Robert Gaizauskas. 2012. Assessing the Comparability of News Texts. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3996–4003, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Assessing the Comparability of News Texts (Barker & Gaizauskas, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1069_Paper.pdf