Corpora for the Conceptualisation and Zoning of Scientific Papers

Maria Liakata; Simone Teufel; Advaith Siddharthan; Colin Batchelor

Corpora for the Conceptualisation and Zoning of Scientific Papers

Maria Liakata, Simone Teufel, Advaith Siddharthan, Colin Batchelor

Abstract

We present two complementary annotation schemes for sentence based annotation of full scientific papers, CoreSC and AZ-II, applied to primary research articles in chemistry. AZ-II is the extension of AZ for chemistry papers. AZ has been shown to have been reliably annotated by independent human coders and useful for various information access tasks. Like AZ, AZ-II follows the rhetorical structure of a scientific paper and the knowledge claims made by the authors. The CoreSC scheme takes a different view of scientific papers, treating them as the humanly readable representations of scientific investigations. It seeks to retrieve the structure of the investigation from the paper as generic high-level Core Scientific Concepts (CoreSC). CoreSCs have been annotated by 16 chemistry experts over a total of 265 full papers in physical chemistry and biochemistry. We describe the differences and similarities between the two schemes in detail and present the two corpora produced using each scheme. There are 36 shared papers in the corpora, which allows us to quantitatively compare aspects of the annotation schemes. We show the correlation between the two schemes, their strengths and weeknesses and discuss the benefits of combining a rhetorical based analysis of the papers with a content-based one.

Anthology ID:: L10-1440
Volume:: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:: May
Year:: 2010
Address:: Valletta, Malta
Editors:: Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2010/pdf/644_Paper.pdf
DOI:
Bibkey:
Cite (ACL):: Maria Liakata, Simone Teufel, Advaith Siddharthan, and Colin Batchelor. 2010. Corpora for the Conceptualisation and Zoning of Scientific Papers. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):: Corpora for the Conceptualisation and Zoning of Scientific Papers (Liakata et al., LREC 2010)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2010/pdf/644_Paper.pdf

PDF Cite Search Fix data