John McNaught

Also published as: J. McNaught

2018

2015

pdf bib

Joint Arabic Segmentation and Part-Of-Speech Tagging
Shabib AlGahtani | John McNaught
Proceedings of the Second Workshop on Arabic Natural Language Processing

2014

pdf bib abs

This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiatives work throughout Europe in order to boost progress and innovation in our field.

2011

pdf bib

2010

pdf bib abs

Evaluating a Text Mining Based Educational Search Portal
Sophia Ananiadou | John McNaught | James Thomas | Mark Rickinson | Sandy Oliver
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present the main features of a text mining based search engine for the UK Educational Evidence Portal available at the UK National Centre for Text Mining (NaCTeM), together with a user-centred framework for the evaluation of the search engine. The framework is adapted from an existing proposal by the ISLE (EAGLES) Evaluation Working group. We introduce the metrics employed for the evaluation, and explain how these relate to the text mining based search engine. Following this, we describe how we applied the framework to the evaluation of a number of key text mining features of the search engine, namely the automatic clustering of search results, classification of search results according to a taxonomy, and identification of topics and other documents that are related to a chosen document. Finally, we present the results of the evaluation in terms of the strengths, weaknesses and improvements identified for each of these features.

pdf bib abs

Meta-Knowledge Annotation of Bio-Events
Raheel Nawaz | Paul Thompson | John McNaught | Sophia Ananiadou
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Biomedical corpora annotated with event-level information provide an important resource for the training of domain-specific information extraction (IE) systems. These corpora concentrate primarily on creating classified, structured representations of important facts and findings contained within the text. However, bio-event annotations often do not take into account additional information (meta-knowledge) that is expressed within the textual context of the bio-event, e.g., the pragmatic/rhetorical intent and the level of certainty ascribed to a particular bio-event by the authors. Such additional information is indispensible for correct interpretation of bio-events. Therefore, an IE system that simply presents a list of bare bio-events, without information concerning their interpretation, is of little practical use. We have addressed this sparseness of meta-knowledge available in existing bio-event corpora by developing a multi-dimensional annotation scheme tailored to bio-events. The scheme is intended to be general enough to allow integration with different types of bio-event annotation, whilst being detailed enough to capture important subtleties in the nature of the meta-knowledge expressed about different bio-events. To our knowledge, our scheme is unique within the field with regards to the diversity of meta-knowledge aspects annotated for each event.

2009

pdf bib

Three BioNLP Tools Powered by a Biological Lexicon
Yutaka Sasaki | Paul Thompson | John McNaught | Sophia Ananiadou
Proceedings of the Demonstrations Session at EACL 2009

2008

pdf bib

Event Frame Extraction Based on a Gene Regulation Corpus
Yutaka Sasaki | Paul Thompson | Philip Cotter | John McNaught | Sophia Ananiadou
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib abs

It is a challenging task to match similar or related terms/expressions in NLP and Text Mining applications. Two typical areas in need for such work are terminology and ontology constructions, where terms and concepts are extracted and organized into certain structures with various semantic relations. In the EU BOOTSTrep Project we test various techniques for matching terms that can assist human domain experts in building and enriching ontologies. This paper reports on a work in which we evaluated a text comparing and clustering tool for this task. Particularly, we explore the feasibility of matching related terms with their definitions. Ontology terms, such as Gene Ontology terms, are often assigned with detailed definitions, which provide a fundamental information source for detecting relations between terms. Here we focus on the exploitation of term definitions for the term matching task. Our experiment shows that the tool is capable of grouping many related terms using their definitions.

pdf bib abs

Building a Bio-Event Annotated Corpus for the Acquisition of Semantic Frames from Biomedical Corpora
Paul Thompson | Philip Cotter | John McNaught | Sophia Ananiadou | Simonetta Montemagni | Andrea Trabucco | Giulia Venturi
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper reports on the design and construction of a bio-event annotated corpus which was developed with a specific view to the acquisition of semantic frames from biomedical corpora. We describe the adopted annotation scheme and the annotation process, which is supported by a dedicated annotation tool. The annotated corpus contains 677 abstracts of biomedical research articles.

pdf bib

How to Make the Most of NE Dictionaries in Statistical NER
Yutaka Sasaki | Yoshimasa Tsuruoka | John McNaught | Sophia Ananiadou
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

2007

pdf bib

2004

pdf bib

Enhancing automatic term recognition through recognition of variation
Goran Nenadic | Sophia Ananiadou | John McNaught
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib abs

A Domain-Independent Approach to IE Rule Development
Kalliopi Zervanou | John McNaught
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

A key element for the extraction of information in a natural language document is a set of shallow text analysis rules, which are typically based on pre-defined linguistic patterns. Current Information Extraction research aims at the automatic or semi-automatic acquisition of these rules. Within this research framework, we consider in this paper the potential for acquiring generic extraction patterns. Our research is based on the hypothesis that, terms (the linguistic representation of concepts in a specialised domain) and Named Entities (the names of persons, organisations and dates of importance in the text) can together be considered as the basic semantic entities of textual information and can therefore be used as a basis for the conceptual representation of domain specific texts and the definition of what constitutes an information extraction template in linguistic terms. The extraction patterns discovered by this approach involve significant associations of these semantic entities with verbs and they can subsequently be translated into the grammar formalism of choice.