CDˆ2CR: Co-reference resolution across documents and domains
James Ravenscroft | Amanda Clare | Arie Cattan | Ido Dagan | Maria Liakata
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Cross-document co-reference resolution (CDCR) is the task of identifying and linking mentions to entities and concepts across many text documents. Current state-of-the-art models for this task assume that all documents are of the same type (e.g. news articles) or fall under the same theme. However, it is also desirable to perform CDCR across different domains (type or theme). A particular use case we focus on in this paper is the resolution of entities mentioned across scientific work and newspaper articles that discuss them. Identifying the same entities and corresponding concepts in both scientific articles and news can help scientists understand how their work is represented in mainstream media. We propose a new task and English language dataset for cross-document cross-domain co-reference resolution (CDˆ2CR). The task aims to identify links between entities across heterogeneous document types. We show that in this cross-domain, cross-document setting, existing CDCR models do not perform well and we provide a baseline model that outperforms current state-of-the-art CDCR models on CDˆ2CR. Our data set, annotation tool and guidelines as well as our model for cross-document cross-domain co-reference are all supplied as open access open source resources.
HarriGT: A Tool for Linking News to Science
James Ravenscroft | Amanda Clare | Maria Liakata
Proceedings of ACL 2018, System Demonstrations
Being able to reliably link scientific works to the newspaper articles that discuss them could provide a breakthrough in the way we rationalise and measure the impact of science on our society. Linking these articles is challenging because the language used in the two domains is very different, and the gathering of online resources to align the two is a substantial information retrieval endeavour. We present HarriGT, a semi-automated tool for building corpora of news articles linked to the scientific papers that they discuss. Our aim is to facilitate future development of information-retrieval tools for newspaper/scientific work citation linking. HarriGT retrieves newspaper articles from an archive containing 17 years of UK web content. It also integrates with 3 large external citation networks, leveraging named entity extraction, and document classification to surface relevant examples of scientific literature to the user. We also provide a tuned candidate ranking algorithm to highlight potential links between scientific papers and newspaper articles to the user, in order of likelihood. HarriGT is provided as an open source tool (http://harrigt.xyz).
Applying Core Scientific Concepts to Context-Based Citation Recommendation
Daniel Duma | Maria Liakata | Amanda Clare | James Ravenscroft | Ewan Klein
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
The task of recommending relevant scientific literature for a draft academic paper has recently received significant interest. In our effort to ease the discovery of scientific literature and augment scientific writing, we aim to improve the relevance of results based on a shallow semantic analysis of the source document and the potential documents to recommend. We investigate the utility of automatic argumentative and rhetorical annotation of documents for this purpose. Specifically, we integrate automatic Core Scientific Concepts (CoreSC) classification into a prototype context-based citation recommendation system and investigate its usefulness to the task. We frame citation recommendation as an information retrieval task and we use the categories of the annotation schemes to apply different weights to the similarity formula. Our results show interesting and consistent correlations between the type of citation and the type of sentence containing the relevant information.
- James Ravenscroft 3
- Maria Liakata 3
- Daniel Duma 1
- Ewan Klein 1
- Arie Cattan 1
- show all...