Sophia Ananiadou


2021

pdf bib
SpanEmo: Casting Multi-label Emotion Classification as Span-prediction
Hassan Alhuzali | Sophia Ananiadou
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Emotion recognition (ER) is an important task in Natural Language Processing (NLP), due to its high impact in real-world applications from health and well-being to author profiling, consumer analysis and security. Current approaches to ER, mainly classify emotions independently without considering that emotions can co-exist. Such approaches overlook potential ambiguities, in which multiple emotions overlap. We propose a new model “SpanEmo” casting multi-label emotion classification as span-prediction, which can aid ER models to learn associations between labels and words in a sentence. Furthermore, we introduce a loss function focused on modelling multiple co-existing emotions in the input sentence. Experiments performed on the SemEval2018 multi-label emotion data over three language sets (i.e., English, Arabic and Spanish) demonstrate our method’s effectiveness. Finally, we present different analyses that illustrate the benefits of our method in terms of improving the model performance and learning meaningful associations between emotion classes and words in the sentence.

pdf bib
Paladin: an annotation tool based on active and proactive learning
Minh-Quoc Nghiem | Paul Baylis | Sophia Ananiadou
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

In this paper, we present Paladin, an open-source web-based annotation tool for creating high-quality multi-label document-level datasets. By integrating active learning and proactive learning to the annotation task, Paladin makes the task less time-consuming and requiring less human effort. Although Paladin is designed for multi-label settings, the system is flexible and can be adapted to other tasks in single-label settings.

pdf bib
Distantly Supervised Relation Extraction with Sentence Reconstruction and Knowledge Base Priors
Fenia Christopoulou | Makoto Miwa | Sophia Ananiadou
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We propose a multi-task, probabilistic approach to facilitate distantly supervised relation extraction by bringing closer the representations of sentences that contain the same Knowledge Base pairs. To achieve this, we bias the latent space of sentences via a Variational Autoencoder (VAE) that is trained jointly with a relation classifier. The latent code guides the pair representations and influences sentence reconstruction. Experimental results on two datasets created via distant supervision indicate that multi-task learning results in performance benefits. Additional exploration of employing Knowledge Base priors into theVAE reveals that the sentence space can be shifted towards that of the Knowledge Base, offering interpretability and further improving results.

pdf bib
Proceedings of the 20th Workshop on Biomedical Language Processing
Dina Demner-Fushman | Kevin Bretonnel Cohen | Sophia Ananiadou | Junichi Tsujii
Proceedings of the 20th Workshop on Biomedical Language Processing

pdf bib
Investigating Text Simplification Evaluation
Laura Vásquez-Rodríguez | Matthew Shardlow | Piotr Przybyła | Sophia Ananiadou
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
A Neural Model for Aggregating Coreference Annotation in Crowdsourcing
Maolin Li | Hiroya Takamura | Sophia Ananiadou
Proceedings of the 28th International Conference on Computational Linguistics

Coreference resolution is the task of identifying all mentions in a text that refer to the same real-world entity. Collecting sufficient labelled data from expert annotators to train a high-performance coreference resolution system is time-consuming and expensive. Crowdsourcing makes it possible to obtain the required amounts of data rapidly and cost-effectively. However, crowd-sourced labels can be noisy. To ensure high-quality data, it is crucial to infer the correct labels by aggregating the noisy labels. In this paper, we split the aggregation into two subtasks, i.e, mention classification and coreference chain inference. Firstly, we predict the general class of each mention using an autoencoder, which incorporates contextual information about each mention, while at the same time taking into account the mention’s annotation complexity and annotators’ reliability at different levels. Secondly, to determine the coreference chain of each mention, we use weighted voting which takes into account the learned reliability in the first subtask. Experimental results demonstrate the effectiveness of our method in predicting the correct labels. We also illustrate our model’s interpretability through a comprehensive analysis of experimental results.

pdf bib
Semantic Annotation for Improved Safety in Construction Work
Paul Thompson | Tim Yates | Emrah Inan | Sophia Ananiadou
Proceedings of the 12th Language Resources and Evaluation Conference

Risk management is a vital activity to ensure employee safety in construction projects. Various documents provide important supporting evidence, including details of previous incidents, consequences and mitigation strategies. Potential hazards may depend on a complex set of project-specific attributes, including activities undertaken, location, equipment used, etc. However, finding evidence about previous projects with similar attributes can be problematic, since information about risks and mitigations is usually hidden within and may be dispersed across a range of different free text documents. Automatic named entity recognition (NER), which identifies mentions of concepts in free text documents, is the first stage in structuring knowledge contained within them. While developing NER methods generally relies on annotated corpora, we are not aware of any such corpus targeted at concepts relevant to construction safety. In response, we have designed a novel named entity annotation scheme and associated guidelines for this domain, which covers hazards, consequences, mitigation strategies and project attributes. Four health and safety experts used the guidelines to annotate a total of 600 sentences from accident reports; an average inter-annotator agreement rate of 0.79 F-Score shows that our work constitutes an important first step towards developing tools for detailed semantic analysis of construction safety documents.

pdf bib
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing
Dina Demner-Fushman | Kevin Bretonnel Cohen | Sophia Ananiadou | Junichi Tsujii
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing

pdf bib
Revisiting Unsupervised Relation Extraction
Thy Thy Tran | Phong Le | Sophia Ananiadou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Unsupervised relation extraction (URE) extracts relations between named entities from raw text without manually-labelled data and existing knowledge bases (KBs). URE methods can be categorised into generative and discriminative approaches, which rely either on hand-crafted features or surface form. However, we demonstrate that by using only named entities to induce relation types, we can outperform existing methods on two popular datasets. We conduct a comparison and evaluation of our findings with other URE techniques, to ascertain the important features in URE. We conclude that entity types provide a strong inductive bias for URE.

2019

pdf bib
A Search-based Neural Model for Biomedical Nested and Overlapping Event Detection
Kurt Junshean Espinosa | Makoto Miwa | Sophia Ananiadou
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We tackle the nested and overlapping event detection task and propose a novel search-based neural network (SBNN) structured prediction model that treats the task as a search problem on a relation graph of trigger-argument structures. Unlike existing structured prediction tasks such as dependency parsing, the task targets to detect DAG structures, which constitute events, from the relation graph. We define actions to construct events and use all the beams in a beam search to detect all event structures that may be overlapping and nested. The search process constructs events in a bottom-up manner while modelling the global properties for nested and overlapping structures simultaneously using neural networks. We show that the model achieves performance comparable to the state-of-the-art model Turku Event Extraction System (TEES) on the BioNLP Cancer Genetics (CG) Shared Task 2013 without the use of any syntactic and hand-engineered features. Further analyses on the development set show that our model is more computationally efficient while yielding higher F1-score performance.

pdf bib
Connecting the Dots: Document-level Neural Relation Extraction with Edge-oriented Graphs
Fenia Christopoulou | Makoto Miwa | Sophia Ananiadou
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Document-level relation extraction is a complex human process that requires logical inference to extract relationships between named entities in text. Existing approaches use graph-based neural models with words as nodes and edges as relations between them, to encode relations across sentences. These models are node-based, i.e., they form pair representations based solely on the two target node representations. However, entity relations can be better expressed through unique edge representations formed as paths between nodes. We thus propose an edge-oriented graph neural model for document-level relation extraction. The model utilises different types of nodes and edges to create a document-level graph. An inference mechanism on the graph edges enables to learn intra- and inter-sentence relations using multi-instance learning internally. Experiments on two document-level biomedical datasets for chemical-disease and gene-disease associations show the usefulness of the proposed edge-oriented approach.

pdf bib
Coreference Resolution in Full Text Articles with BERT and Syntax-based Mention Filtering
Hai-Long Trieu | Anh-Khoa Duong Nguyen | Nhung Nguyen | Makoto Miwa | Hiroya Takamura | Sophia Ananiadou
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks

This paper describes our system developed for the coreference resolution task of the CRAFT Shared Tasks 2019. The CRAFT corpus is more challenging than other existing corpora because it contains full text articles. We have employed an existing span-based state-of-theart neural coreference resolution system as a baseline system. We enhance the system with two different techniques to capture longdistance coreferent pairs. Firstly, we filter noisy mentions based on parse trees with increasing the number of antecedent candidates. Secondly, instead of relying on the LSTMs, we integrate the highly expressive language model–BERT into our model. Experimental results show that our proposed systems significantly outperform the baseline. The best performing system obtained F-scores of 44%, 48%, 39%, 49%, 40%, and 57% on the test set with B3, BLANC, CEAFE, CEAFM, LEA, and MUC metrics, respectively. Additionally, the proposed model is able to detect coreferent pairs in long distances, even with a distance of more than 200 sentences.

pdf bib
Modelling Instance-Level Annotator Reliability for Natural Language Labelling Tasks
Maolin Li | Arvid Fahlström Myrman | Tingting Mu | Sophia Ananiadou
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

When constructing models that learn from noisy labels produced by multiple annotators, it is important to accurately estimate the reliability of annotators. Annotators may provide labels of inconsistent quality due to their varying expertise and reliability in a domain. Previous studies have mostly focused on estimating each annotator’s overall reliability on the entire annotation task. However, in practice, the reliability of an annotator may depend on each specific instance. Only a limited number of studies have investigated modelling per-instance reliability and these only considered binary labels. In this paper, we propose an unsupervised model which can handle both binary and multi-class labels. It can automatically estimate the per-instance reliability of each annotator and the correct label for each instance. We specify our model as a probabilistic model which incorporates neural networks to model the dependency between latent variables and instances. For evaluation, the proposed method is applied to both synthetic and real data, including two labelling tasks: text classification and textual entailment. Experimental results demonstrate our novel method can not only accurately estimate the reliability of annotators across different instances, but also achieve superior performance in predicting the correct labels and detecting the least reliable annotators compared to state-of-the-art baselines.

pdf bib
Proceedings of the 18th BioNLP Workshop and Shared Task
Dina Demner-Fushman | Kevin Bretonnel Cohen | Sophia Ananiadou | Junichi Tsujii
Proceedings of the 18th BioNLP Workshop and Shared Task

pdf bib
Improving classification of Adverse Drug Reactions through Using Sentiment Analysis and Transfer Learning
Hassan Alhuzali | Sophia Ananiadou
Proceedings of the 18th BioNLP Workshop and Shared Task

The availability of large-scale and real-time data on social media has motivated research into adverse drug reactions (ADRs). ADR classification helps to identify negative effects of drugs, which can guide health professionals and pharmaceutical companies in making medications safer and advocating patients’ safety. Based on the observation that in social media, negative sentiment is frequently expressed towards ADRs, this study presents a neural model that combines sentiment analysis with transfer learning techniques to improve ADR detection in social media postings. Our system is firstly trained to classify sentiment in tweets concerning current affairs, using the SemEval17-task4A corpus. We then apply transfer learning to adapt the model to the task of detecting ADRs in social media postings. We show that, in combination with rich representations of words and their contexts, transfer learning is beneficial, especially given the large degree of vocabulary overlap between the current affairs posts in the SemEval17-task4A corpus and posts about ADRs. We compare our results with previous approaches, and show that our model can outperform them by up to 3% F-score.

pdf bib
Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network
Sunil Kumar Sahu | Fenia Christopoulou | Makoto Miwa | Sophia Ananiadou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Inter-sentence relation extraction deals with a number of complex semantic relationships in documents, which require local, non-local, syntactic and semantic dependencies. Existing methods do not fully exploit such dependencies. We present a novel inter-sentence relation extraction model that builds a labelled edge graph convolutional neural network model on a document-level graph. The graph is constructed using various inter- and intra-sentence dependencies to capture local and non-local dependency information. In order to predict the relation of an entity pair, we utilise multi-instance learning with bi-affine pairwise scoring. Experimental results show that our model achieves comparable performance to the state-of-the-art neural models on two biochemistry datasets. Our analysis shows that all the types in the graph are effective for inter-sentence relation extraction.

2018

pdf bib
A Neural Layered Model for Nested Named Entity Recognition
Meizhi Ju | Makoto Miwa | Sophia Ananiadou
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Entity mentions embedded in longer entity mentions are referred to as nested entities. Most named entity recognition (NER) systems deal only with the flat entities and ignore the inner nested ones, which fails to capture finer-grained semantic information in underlying texts. To address this issue, we propose a novel neural model to identify nested entities by dynamically stacking flat NER layers. Each flat NER layer is based on the state-of-the-art flat NER model that captures sequential context representation with bidirectional Long Short-Term Memory (LSTM) layer and feeds it to the cascaded CRF layer. Our model merges the output of the LSTM layer in the current flat NER layer to build new representation for detected entities and subsequently feeds them into the next flat NER layer. This allows our model to extract outer entities by taking full advantage of information encoded in their corresponding inner entities, in an inside-to-outside way. Our model dynamically stacks the flat NER layers until no outer entities are extracted. Extensive evaluation shows that our dynamic model outperforms state-of-the-art feature-based systems on nested NER, achieving 74.7% and 72.2% on GENIA and ACE2005 datasets, respectively, in terms of F-score.

pdf bib
A New Corpus to Support Text Mining for the Curation of Metabolites in the ChEBI Database
Matthew Shardlow | Nhung Nguyen | Gareth Owen | Claire O’Donovan | Andrew Leach | John McNaught | Steve Turner | Sophia Ananiadou
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
A Walk-based Model on Entity Graphs for Relation Extraction
Fenia Christopoulou | Makoto Miwa | Sophia Ananiadou
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We present a novel graph-based neural network model for relation extraction. Our model treats multiple pairs in a sentence simultaneously and considers interactions among them. All the entities in a sentence are placed as nodes in a fully-connected graph structure. The edges are represented with position-aware contexts around the entity pairs. In order to consider different relation paths between two entities, we construct up to l-length walks between each pair. The resulting walks are merged and iteratively used to update the edge representations into longer walks representations. We show that the model achieves performance comparable to the state-of-the-art systems on the ACE 2005 dataset without using any external tools.

pdf bib
Paths for uncertainty: Exploring the intricacies of uncertainty identification for news
Chrysoula Zerva | Sophia Ananiadou
Proceedings of the Workshop on Computational Semantics beyond Events and Roles

Currently, news articles are produced, shared and consumed at an extremely rapid rate. Although their quantity is increasing, at the same time, their quality and trustworthiness is becoming fuzzier. Hence, it is important not only to automate information extraction but also to quantify the certainty of this information. Automated identification of certainty has been studied both in the scientific and newswire domains, but performance is considerably higher in tasks focusing on scientific text. We compare the differences in the definition and expression of uncertainty between a scientific domain, i.e., biomedicine, and newswire. We delve into the different aspects that affect the certainty of an extracted event in a news article and examine whether they can be easily identified by techniques already validated in the biomedical domain. Finally, we present a comparison of the syntactic and lexical differences between the the expression of certainty in the biomedical and newswire domains, using two annotated corpora.

pdf bib
Proceedings of the BioNLP 2018 workshop
Dina Demner-Fushman | Kevin Bretonnel Cohen | Sophia Ananiadou | Junichi Tsujii
Proceedings of the BioNLP 2018 workshop

pdf bib
Investigating Domain-Specific Information for Neural Coreference Resolution on Biomedical Texts
Hai-Long Trieu | Nhung T. H. Nguyen | Makoto Miwa | Sophia Ananiadou
Proceedings of the BioNLP 2018 workshop

Existing biomedical coreference resolution systems depend on features and/or rules based on syntactic parsers. In this paper, we investigate the utility of the state-of-the-art general domain neural coreference resolution system on biomedical texts. The system is an end-to-end system without depending on any syntactic parsers. We also investigate the domain specific features to enhance the system for biomedical texts. Experimental results on the BioNLP Protein Coreference dataset and the CRAFT corpus show that, with no parser information, the adapted system compared favorably with the systems that depend on parser information on these datasets, achieving 51.23% on the BioNLP dataset and 36.33% on the CRAFT corpus in F1 score. In-domain embeddings and domain-specific features helped improve the performance on the BioNLP dataset, but they did not on the CRAFT corpus.

pdf bib
APLenty: annotation tool for creating high-quality datasets using active and proactive learning
Minh-Quoc Nghiem | Sophia Ananiadou
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

In this paper, we present APLenty, an annotation tool for creating high-quality sequence labeling datasets using active and proactive learning. A major innovation of our tool is the integration of automatic annotation with active learning and proactive learning. This makes the task of creating labeled datasets easier, less time-consuming and requiring less human effort. APLenty is highly flexible and can be adapted to various other tasks.

2017

pdf bib
BioNLP 2017
Kevin Bretonnel Cohen | Dina Demner-Fushman | Sophia Ananiadou | Junichi Tsujii
BioNLP 2017

pdf bib
Proactive Learning for Named Entity Recognition
Maolin Li | Nhung Nguyen | Sophia Ananiadou
BioNLP 2017

The goal of active learning is to minimise the cost of producing an annotated dataset, in which annotators are assumed to be perfect, i.e., they always choose the correct labels. However, in practice, annotators are not infallible, and they are likely to assign incorrect labels to some instances. Proactive learning is a generalisation of active learning that can model different kinds of annotators. Although proactive learning has been applied to certain labelling tasks, such as text classification, there is little work on its application to named entity (NE) tagging. In this paper, we propose a proactive learning method for producing NE annotated corpora, using two annotators with different levels of expertise, and who charge different amounts based on their levels of experience. To optimise both cost and annotation quality, we also propose a mechanism to present multiple sentences to annotators at each iteration. Experimental results for several corpora show that our method facilitates the construction of high-quality NE labelled datasets at minimal cost.

pdf bib
Distributed Document and Phrase Co-embeddings for Descriptive Clustering
Motoki Sato | Austin J. Brockmeier | Georgios Kontonatsios | Tingting Mu | John Y. Goulermas | Jun’ichi Tsujii | Sophia Ananiadou
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Descriptive document clustering aims to automatically discover groups of semantically related documents and to assign a meaningful label to characterise the content of each cluster. In this paper, we present a descriptive clustering approach that employs a distributed representation model, namely the paragraph vector model, to capture semantic similarities between documents and phrases. The proposed method uses a joint representation of phrases and documents (i.e., a co-embedding) to automatically select a descriptive phrase that best represents each document cluster. We evaluate our method by comparing its performance to an existing state-of-the-art descriptive clustering method that also uses co-embedding but relies on a bag-of-words representation. Results obtained on benchmark datasets demonstrate that the paragraph vector-based method obtains superior performance over the existing approach in both identifying clusters and assigning appropriate descriptive labels to them.

2016

pdf bib
Ensemble Classification of Grants using LDA-based Features
Yannis Korkontzelos | Beverley Thomas | Makoto Miwa | Sophia Ananiadou
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Classifying research grants into useful categories is a vital task for a funding body to give structure to the portfolio for analysis, informing strategic planning and decision-making. Automating this classification process would save time and effort, providing the accuracy of the classifications is maintained. We employ five classification models to classify a set of BBSRC-funded research grants in 21 research topics based on unigrams, technical terms and Latent Dirichlet Allocation models. To boost precision, we investigate methods for combining their predictions into five aggregate classifiers. Evaluation confirmed that ensemble classification models lead to higher precision.It was observed that there is not a single best-performing aggregate method for all research topics. Instead, the best-performing method for a research topic depends on the number of positive training instances available for this topic. Subject matter experts considered the predictions of aggregate models to correct erroneous or incomplete manual assignments.

pdf bib
Identifying Content Types of Messages Related to Open Source Software Projects
Yannis Korkontzelos | Paul Thompson | Sophia Ananiadou
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Assessing the suitability of an Open Source Software project for adoption requires not only an analysis of aspects related to the code, such as code quality, frequency of updates and new version releases, but also an evaluation of the quality of support offered in related online forums and issue trackers. Understanding the content types of forum messages and issue trackers can provide information about the extent to which requests are being addressed and issues are being resolved, the percentage of issues that are not being fixed, the cases where the user acknowledged that the issue was successfully resolved, etc. These indicators can provide potential adopters of the OSS with estimates about the level of available support. We present a detailed hierarchy of content types of online forum messages and issue tracker comments and a corpus of messages annotated accordingly. We discuss our experiments to classify forum messages and issue tracker comments into content-related classes, i.e.~to assign them to nodes of the hierarchy. The results are very encouraging.

pdf bib
Proceedings of the 15th Workshop on Biomedical Natural Language Processing
Kevin Bretonnel Cohen | Dina Demner-Fushman | Sophia Ananiadou | Jun-ichi Tsujii
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

pdf bib
Learning to recognise named entities in tweets by exploiting weakly labelled data
Kurt Junshean Espinosa | Riza Theresa Batista-Navarro | Sophia Ananiadou
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

Named entity recognition (NER) in social media (e.g., Twitter) is a challenging task due to the noisy nature of text. As part of our participation in the W-NUT 2016 Named Entity Recognition Shared Task, we proposed an unsupervised learning approach using deep neural networks and leverage a knowledge base (i.e., DBpedia) to bootstrap sparse entity types with weakly labelled data. To further boost the performance, we employed a more sophisticated tagging scheme and applied dropout as a regularisation technique in order to reduce overfitting. Even without hand-crafting linguistic features nor leveraging any of the W-NUT-provided gazetteers, we obtained robust performance with our approach, which ranked third amongst all shared task participants according to the official evaluation on a gold standard named entity-annotated corpus of 3,856 tweets.

pdf bib
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)
Sophia Ananiadou | Riza Batista-Navarro | Kevin Bretonnel Cohen | Dina Demner-Fushman | Paul Thompson
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)

pdf bib
NaCTeM at SemEval-2016 Task 1: Inferring sentence-level semantic similarity from an ensemble of complementary lexical and sentence-level features
Piotr Przybyła | Nhung T. H. Nguyen | Matthew Shardlow | Georgios Kontonatsios | Sophia Ananiadou
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Proceedings of BioNLP 15
Kevin Bretonnel Cohen | Dina Demner-Fushman | Sophia Ananiadou | Jun-ichi Tsujii
Proceedings of BioNLP 15

pdf bib
Event Extraction in pieces:Tackling the partial event identification problem on unseen corpora
Chrysoula Zerva | Sophia Ananiadou
Proceedings of BioNLP 15

2014

pdf bib
Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora
Georgios Kontonatsios | Ioannis Korkontzelos | Jun’ichi Tsujii | Sophia Ananiadou
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
Keynote: Supporting evidence-based medicine using text mining
Sophia Ananiadou
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)

pdf bib
Building a semantically annotated corpus for congestive heart and renal failure from clinical records and the literature
Noha Alnazzawi | Paul Thompson | Sophia Ananiadou
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)

pdf bib
Proceedings of BioNLP 2014
Kevin Cohen | Dina Demner-Fushman | Sophia Ananiadou | Jun-ichi Tsujii
Proceedings of BioNLP 2014

pdf bib
Comparable Study of Event Extraction in Newswire and Biomedical Domains
Makoto Miwa | Paul Thompson | Ioannis Korkontzelos | Sophia Ananiadou
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Interoperability and Customisation of Annotation Schemata in Argo
Rafal Rak | Jacob Carter | Andrew Rowley | Riza Theresa Batista-Navarro | Sophia Ananiadou
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The process of annotating text corpora involves establishing annotation schemata which define the scope and depth of an annotation task at hand. We demonstrate this activity in Argo, a Web-based workbench for the analysis of textual resources, which facilitates both automatic and manual annotation. Annotation tasks in the workbench are defined by building workflows consisting of a selection of available elementary analytics developed in compliance with the Unstructured Information Management Architecture specification. The architecture accommodates complex annotation types that may define primitive as well as referential attributes. Argo aids the development of custom annotation schemata and supports their interoperability by featuring a schema editor and specialised analytics for schemata alignment. The schema editor is a self-contained graphical user interface for defining annotation types. Multiple heterogeneous schemata can be aligned by including one of two type mapping analytics currently offered in Argo. One is based on a simple mapping syntax and, although limited in functionality, covers most common use cases. The other utilises a well established graph query language, SPARQL, and is superior to other state-of-the-art solutions in terms of expressiveness. We argue that the customisation of annotation schemata does not need to compromise their interoperability.

pdf bib
The Meta-knowledge of Causality in Biomedical Scientific Discourse
Claudiu Mihăilă | Sophia Ananiadou
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Causality lies at the heart of biomedical knowledge, being involved in diagnosis, pathology or systems biology. Thus, automatic causality recognition can greatly reduce the human workload by suggesting possible causal connections and aiding in the curation of pathway models. For this, we rely on corpora that are annotated with classified, structured representations of important facts and findings contained within text. However, it is impossible to correctly interpret these annotations without additional information, e.g., classification of an event as fact, hypothesis, experimental result or analysis of results, confidence of authors about the validity of their analyses etc. In this study, we analyse and automatically detect this type of information, collectively termed meta-knowledge (MK), in the context of existing discourse causality annotations. Our effort proves the feasibility of identifying such pieces of information, without which the understanding of causal relations is limited.

pdf bib
The Strategic Impact of META-NET on the Regional, National and International Level
Georg Rehm | Hans Uszkoreit | Sophia Ananiadou | Núria Bel | Audronė Bielevičienė | Lars Borin | António Branco | Gerhard Budin | Nicoletta Calzolari | Walter Daelemans | Radovan Garabík | Marko Grobelnik | Carmen García-Mateo | Josef van Genabith | Jan Hajič | Inma Hernáez | John Judge | Svetla Koeva | Simon Krek | Cvetana Krstev | Krister Lindén | Bernardo Magnini | Joseph Mariani | John McNaught | Maite Melero | Monica Monachini | Asunción Moreno | Jan Odijk | Maciej Ogrodniczuk | Piotr Pęzik | Stelios Piperidis | Adam Przepiórkowski | Eiríkur Rögnvaldsson | Michael Rosner | Bolette Pedersen | Inguna Skadiņa | Koenraad De Smedt | Marko Tadić | Paul Thompson | Dan Tufiş | Tamás Váradi | Andrejs Vasiļjevs | Kadri Vider | Jolanta Zabarskaite
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiative’s work throughout Europe in order to boost progress and innovation in our field.

pdf bib
Locating Requests among Open Source Software Communication Messages
Ioannis Korkontzelos | Sophia Ananiadou
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

As a first step towards assessing the quality of support offered online for Open Source Software (OSS), we address the task of locating requests, i.e., messages that raise an issue to be addressed by the OSS community, as opposed to any other message. We present a corpus of online communication messages randomly sampled from newsgroups and bug trackers, manually annotated as requests or non-requests. We identify several linguistically shallow, content-based heuristics that correlate with the classification and investigate the extent to which they can serve as independent classification criteria. Then, we train machine-learning classifiers on these heuristics. We experiment with a wide range of settings, such as different learners, excluding some heuristics and adding unigram features of various parts-of-speech and frequency. We conclude that some heuristics can perform well, while their accuracy can be improved further using machine learning, at the cost of obtaining manual annotations.

pdf bib
Combining String and Context Similarity for Bilingual Term Alignment from Comparable Corpora
Georgios Kontonatsios | Ioannis Korkontzelos | Jun’ichi Tsujii | Sophia Ananiadou
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Proceedings of the 2013 Workshop on Biomedical Natural Language Processing
Kevin Bretonnel Cohen | Dina Demner-Fushman | Sophia Ananiadou | John Pestian | Jun’ichi Tsujii
Proceedings of the 2013 Workshop on Biomedical Natural Language Processing

pdf bib
Overview of the Cancer Genetics (CG) task of BioNLP Shared Task 2013
Sampo Pyysalo | Tomoko Ohta | Sophia Ananiadou
Proceedings of the BioNLP Shared Task 2013 Workshop

pdf bib
Overview of the Pathway Curation (PC) task of BioNLP Shared Task 2013
Tomoko Ohta | Sampo Pyysalo | Rafal Rak | Andrew Rowley | Hong-Woo Chun | Sung-Jae Jung | Sung-Pil Choi | Sophia Ananiadou | Jun’ichi Tsujii
Proceedings of the BioNLP Shared Task 2013 Workshop

pdf bib
NaCTeM EventMine for BioNLP 2013 CG and PC tasks
Makoto Miwa | Sophia Ananiadou
Proceedings of the BioNLP Shared Task 2013 Workshop

pdf bib
Towards a Better Understanding of Discourse: Integrating Multiple Discourse Annotation Perspectives Using UIMA
Claudiu Mihăilă | Georgios Kontonatsios | Riza Theresa Batista-Navarro | Paul Thompson | Ioannis Korkontzelos | Sophia Ananiadou
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib
Making UIMA Truly Interoperable with SPARQL
Rafal Rak | Sophia Ananiadou
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib
Using a Random Forest Classifier to recognise translations of biomedical terms across languages
Georgios Kontonatsios | Ioannis Korkontzelos | Sophia Ananiadou | Jun’ichi Tsujii
Proceedings of the Sixth Workshop on Building and Using Comparable Corpora

pdf bib
What causes a causal relation? Detecting Causal Triggers in Biomedical Scientific Discourse
Claudiu Mihăilă | Sophia Ananiadou
51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop

pdf bib
Extending an interoperable platform to facilitate the creation of multilingual and multimodal NLP applications
Georgios Kontonatsios | Paul Thompson | Riza Theresa Batista-Navarro | Claudiu Mihăilă | Ioannis Korkontzelos | Sophia Ananiadou
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf bib
Development and Analysis of NLP Pipelines in Argo
Rafal Rak | Andrew Rowley | Jacob Carter | Sophia Ananiadou
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2012

pdf bib
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Kevin B. Cohen | Dina Demner-Fushman | Sophia Ananiadou | Bonnie Webber | Jun’ichi Tsujii | John Pestian
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

pdf bib
PubMed-Scale Event Extraction for Post-Translational Modifications, Epigenetics and Protein Structural Relations
Jari Björne | Sofie Van Landeghem | Sampo Pyysalo | Tomoko Ohta | Filip Ginter | Yves Van de Peer | Sophia Ananiadou | Tapio Salakoski
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

pdf bib
New Resources and Perspectives for Biomedical Event Extraction
Sampo Pyysalo | Pontus Stenetorp | Tomoko Ohta | Jin-Dong Kim | Sophia Ananiadou
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

pdf bib
Bridging the Gap Between Scope-based and Event-based Negation/Speculation Annotations: A Bridge Not Too Far
Pontus Stenetorp | Sampo Pyysalo | Tomoko Ohta | Sophia Ananiadou | Jun’ichi Tsujii
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics

pdf bib
Open-domain Anatomical Entity Mention Detection
Tomoko Ohta | Sampo Pyysalo | Jun’ichi Tsujii | Sophia Ananiadou
Proceedings of the Workshop on Detecting Structure in Scholarly Discourse

pdf bib
A three-way perspective on scientific discourse annotation for knowledge extraction
Maria Liakata | Paul Thompson | Anita de Waard | Raheel Nawaz | Henk Pander Maat | Sophia Ananiadou
Proceedings of the Workshop on Detecting Structure in Scholarly Discourse

pdf bib
Biomedical Chinese-English CLIR Using an Extended CMeSH Resource to Expand Queries
Xinkai Wang | Paul Thompson | Jun’ichi Tsujii | Sophia Ananiadou
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Cross-lingual information retrieval (CLIR) involving the Chinese language has been thoroughly studied in the general language domain, but rarely in the biomedical domain, due to the lack of suitable linguistic resources and parsing tools. In this paper, we describe a Chinese-English CLIR system for biomedical literature, which exploits a bilingual ontology, the ``eCMeSH Tree"""". This is an extension of the Chinese Medical Subject Headings (CMeSH) Tree, based on Medical Subject Headings (MeSH). Using the 2006 and 2007 TREC Genomics track data, we have evaluated the performance of the eCMeSH Tree in expanding queries. We have compared our results to those obtained using two other approaches, i.e. pseudo-relevance feedback (PRF) and document translation (DT). Subsequently, we evaluate the performance of different combinations of these three retrieval methods. Our results show that our method of expanding queries using the eCMeSH Tree can outperform the PRF method. Furthermore, combining this method with PRF and DT helps to smooth the differences in query expansion, and consequently results in the best performance amongst all experiments reported. All experiments compare the use of two different retrieval models, i.e. Okapi BM25 and a query likelihood language model. In general, the former performs slightly better.

pdf bib
Identification of Manner in Bio-Events
Raheel Nawaz | Paul Thompson | Sophia Ananiadou
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Due to the rapid growth in the volume of biomedical literature, there is an increasing requirement for high-performance semantic search systems, which allow biologists to perform precise searches for events of interest. Such systems are usually trained on corpora of documents that contain manually annotated events. Until recently, these corpora, and hence the event extraction systems trained on them, focussed almost exclusively on the identification and classification of event arguments, without taking into account how the textual context of the events could affect their interpretation. Previously, we designed an annotation scheme to enrich events with several aspects (or dimensions) of interpretation, which we term meta-knowledge, and applied this scheme to the entire GENIA corpus. In this paper, we report on our experiments to automate the assignment of one of these meta-knowledge dimensions, i.e. Manner, to recognised events. Manner is concerned with the rate, strength intensity or level of the event. We distinguish three different values of manner, i.e., High, Low and Neutral. To our knowledge, our work represents the first attempt to classify the manner of events. Using a combination of lexical, syntactic and semantic features, our system achieves an overall accuracy of 99.4%.

pdf bib
Collaborative Development and Evaluation of Text-processing Workflows in a UIMA-supported Web-based Workbench
Rafal Rak | Andrew Rowley | Sophia Ananiadou
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Challenges in creating comprehensive text-processing worklows include a lack of the interoperability of individual components coming from different providers and/or a requirement imposed on the end users to know programming techniques to compose such workflows. In this paper we demonstrate Argo, a web-based system that addresses these issues in several ways. It supports the widely adopted Unstructured Information Management Architecture (UIMA), which handles the problem of interoperability; it provides a web browser-based interface for developing workflows by drawing diagrams composed of a selection of available processing components; and it provides novel user-interactive analytics such as the annotation editor which constitutes a bridge between automatic processing and manual correction. These features extend the target audience of Argo to users with a limited or no technical background. Here, we focus specifically on the construction of advanced workflows, involving multiple branching and merging points, to facilitate various comparative evalutions. Together with the use of user-collaboration capabilities supported in Argo, we demonstrate several use cases including visual inspections, comparisions of multiple processing segments or complete solutions against a reference standard, inter-annotator agreement, and shared task mass evaluations. Ultimetely, Argo emerges as a one-stop workbench for defining, processing, editing and evaluating text processing tasks.

pdf bib
A data and analysis resource for an experiment in text mining a collection of micro-blogs on a political topic.
William Black | Rob Procter | Steven Gray | Sophia Ananiadou
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The analysis of a corpus of micro-blogs on the topic of the 2011 UK referendum about the Alternative Vote has been undertaken as a joint activity by text miners and social scientists. To facilitate the collaboration, the corpus and its analysis is managed in a Web-accessible framework that allows users to upload their own textual data for analysis and to manage their own text annotation resources used for analysis. The framework also allows annotations to be searched, and the analysis to be re-run after amending the analysis resources. The corpus is also doubly human-annotated stating both whether each tweet is overall positive or negative in sentiment and whether it is for or against the proposition of the referendum.

pdf bib
Building Trainable Taggers in a Web-based, UIMA-Supported NLP Workbench
Rafal Rak | BalaKrishna Kolluru | Sophia Ananiadou
Proceedings of the ACL 2012 System Demonstrations

pdf bib
brat: a Web-based Tool for NLP-Assisted Text Annotation
Pontus Stenetorp | Sampo Pyysalo | Goran Topić | Tomoko Ohta | Sophia Ananiadou | Jun’ichi Tsujii
Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf bib
Proceedings of BioNLP 2011 Workshop
Kevin Bretonnel Cohen | Dina Demner-Fushman | Sophia Ananiadou | John Pestian | Jun’ichi Tsujii | Bonnie Webber
Proceedings of BioNLP 2011 Workshop

pdf bib
Building a Coreference-Annotated Corpus from the Domain of Biochemistry
Riza Theresa Batista-Navarro | Sophia Ananiadou
Proceedings of BioNLP 2011 Workshop

pdf bib
Enrichment and Structuring of Archival Description Metadata
Kalliopi Zervanou | Ioannis Korkontzelos | Antal van den Bosch | Sophia Ananiadou
Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf bib
Overview of the Infectious Diseases (ID) task of BioNLP Shared Task 2011
Sampo Pyysalo | Tomoko Ohta | Rafal Rak | Dan Sullivan | Chunhong Mao | Chunxia Wang | Bruno Sobral | Jun’ichi Tsujii | Sophia Ananiadou
Proceedings of BioNLP Shared Task 2011 Workshop

pdf bib
Promoting Interoperability of Resources in META-SHARE
Paul Thompson | Yoshinobu Kano | John McNaught | Steve Pettifer | Teresa Attwood | John Keane | Sophia Ananiadou
Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm

2010

pdf bib
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
K. Bretonnel Cohen | Dina Demner-Fushman | Sophia Ananiadou | John Pestian | Jun’ichi Tsujii | Bonnie Webber
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing

pdf bib
Towards Event Extraction from Full Texts on Infectious Diseases
Sampo Pyysalo | Tomoko Ohta | Han-Cheol Cho | Dan Sullivan | Chunhong Mao | Bruno Sobral | Jun’ichi Tsujii | Sophia Ananiadou
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing

pdf bib
Evaluating a meta-knowledge annotation scheme for bio-events
Raheel Nawaz | Paul Thompson | Sophia Ananiadou
Proceedings of the Workshop on Negation and Speculation in Natural Language Processing

pdf bib
Evaluating a Text Mining Based Educational Search Portal
Sophia Ananiadou | John McNaught | James Thomas | Mark Rickinson | Sandy Oliver
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present the main features of a text mining based search engine for the UK Educational Evidence Portal available at the UK National Centre for Text Mining (NaCTeM), together with a user-centred framework for the evaluation of the search engine. The framework is adapted from an existing proposal by the ISLE (EAGLES) Evaluation Working group. We introduce the metrics employed for the evaluation, and explain how these relate to the text mining based search engine. Following this, we describe how we applied the framework to the evaluation of a number of key text mining features of the search engine, namely the automatic clustering of search results, classification of search results according to a taxonomy, and identification of topics and other documents that are related to a chosen document. Finally, we present the results of the evaluation in terms of the strengths, weaknesses and improvements identified for each of these features.

pdf bib
Meta-Knowledge Annotation of Bio-Events
Raheel Nawaz | Paul Thompson | John McNaught | Sophia Ananiadou
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Biomedical corpora annotated with event-level information provide an important resource for the training of domain-specific information extraction (IE) systems. These corpora concentrate primarily on creating classified, structured representations of important facts and findings contained within the text. However, bio-event annotations often do not take into account additional information (meta-knowledge) that is expressed within the textual context of the bio-event, e.g., the pragmatic/rhetorical intent and the level of certainty ascribed to a particular bio-event by the authors. Such additional information is indispensible for correct interpretation of bio-events. Therefore, an IE system that simply presents a list of “bare” bio-events, without information concerning their interpretation, is of little practical use. We have addressed this sparseness of meta-knowledge available in existing bio-event corpora by developing a multi-dimensional annotation scheme tailored to bio-events. The scheme is intended to be general enough to allow integration with different types of bio-event annotation, whilst being detailed enough to capture important subtleties in the nature of the meta-knowledge expressed about different bio-events. To our knowledge, our scheme is unique within the field with regards to the diversity of meta-knowledge aspects annotated for each event.

pdf bib
U-Compare: An Integrated Language Resource Evaluation Platform Including a Comprehensive UIMA Resource Library
Yoshinobu Kano | Ruben Dorado | Luke McCrohon | Sophia Ananiadou | Jun’ichi Tsujii
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Language resources, including corpus and tools, are normally required to be combined in order to achieve a user’s specific task. However, resources tend to be developed independently in different, incompatible formats. In this paper we describe about U-Compare, which consists of the U-Compare component repository and the U-Compare platform. We have been building a highly interoperable resource library, providing the world largest ready-to-use UIMA component repository including wide variety of corpus readers and state-of-the-art language tools. These resources can be deployed as local services or web services, even possible to be hosted in clustered machines to increase the performance, while users do not need to be aware of such differences. In addition to the resource library, an integrated language processing platform is provided, allowing workflow creation, comparison, evaluation and visualization, using the resources in the library or any UIMA component, without any programming via graphical user interfaces, while a command line launcher is also available without GUIs. The evaluation itself is processed in a UIMA component, users can create and plug their own evaluation metrics in addition to the predefined metrics. U-Compare has been successfully used in many projects including BioCreative, Conll and the BioNLP shared task.

pdf bib
Imbalanced Classification Using Dictionary-based Prototypes and Hierarchical Decision Rules for Entity Sense Disambiguation
Tingting Mu | Xinglong Wang | Jun’ichi Tsujii | Sophia Ananiadou
Coling 2010: Posters

2009

pdf bib
Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty
Yoshimasa Tsuruoka | Jun’ichi Tsujii | Sophia Ananiadou
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Fast Full Parsing by Linear-Chain Conditional Random Fields
Yoshimasa Tsuruoka | Jun’ichi Tsujii | Sophia Ananiadou
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Three BioNLP Tools Powered by a Biological Lexicon
Yutaka Sasaki | Paul Thompson | John McNaught | Sophia Ananiadou
Proceedings of the Demonstrations Session at EACL 2009

pdf bib
Classifying Relations for Biomedical Named Entity Disambiguation
Xinglong Wang | Jun’ichi Tsujii | Sophia Ananiadou
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
ASSIST : un moteur de recherche spécialisé pour l’analyse des cadres d’expériences
Davy Weissenbacher | Elisa Pieri | Sophia Ananiadou | Brian Rea | Farida Vis | Yuwei Lin | Rob Procter | Peter Halfpenny
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations

L’analyse qualitative des données demande au sociologue un important travail de sélection et d’interprétation des documents. Afin de faciliter ce travail, cette communauté c’est dotée d’outils informatique mais leur fonctionnalités sont encore limitées. Le projet ASSIST est une étude exploratoire pour préciser les modules de traitement automatique des langues (TAL) permettant d’assister le sociologue dans son travail d’analyse. Nous présentons le moteur de recherche réalisé et nous justifions le choix des composants de TAL intégrés au prototype.

pdf bib
Proceedings of the BioNLP 2009 Workshop
K. Bretonnel Cohen | Dina Demner-Fushman | Sophia Ananiadou | John Pestian | Jun’ichi Tsujii | Bonnie Webber
Proceedings of the BioNLP 2009 Workshop

pdf bib
Integrated NLP Evaluation System for Pluggable Evaluation Metrics with Extensive Interoperable Toolkit
Yoshinobu Kano | Luke McCrohon | Sophia Ananiadou | Jun’ichi Tsujii
Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing (SETQA-NLP 2009)

2008

pdf bib
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Dina Demner-Fushman | Sophia Ananiadou | Kevin Bretonnel Cohen | John Pestian | Jun’ichi Tsujii | Bonnie Webber
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

pdf bib
Accelerating the Annotation of Sparse Named Entities by Dynamic Sentence Selection
Yoshimasa Tsuruoka | Jun’ichi Tsujii | Sophia Ananiadou
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

pdf bib
How to Make the Most of NE Dictionaries in Statistical NER
Yutaka Sasaki | Yoshimasa Tsuruoka | John McNaught | Sophia Ananiadou
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

pdf bib
A Discriminative Alignment Model for Abbreviation Recognition
Naoaki Okazaki | Sophia Ananiadou | Jun’ichi Tsujii
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Event Frame Extraction Based on a Gene Regulation Corpus
Yutaka Sasaki | Paul Thompson | Philip Cotter | John McNaught | Sophia Ananiadou
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
A Discriminative Candidate Generator for String Transformations
Naoaki Okazaki | Yoshimasa Tsuruoka | Sophia Ananiadou | Jun’ichi Tsujii
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Identifying Sections in Scientific Abstracts using Conditional Random Fields
Kenji Hirohata | Naoaki Okazaki | Sophia Ananiadou | Mitsuru Ishizuka
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Towards Data and Goal Oriented Analysis: Tool Inter-operability and Combinatorial Comparison
Yoshinobu Kano | Ngan Nguyen | Rune Sætre | Kazuhiro Yoshida | Keiichiro Fukamachi | Yusuke Miyao | Yoshimasa Tsuruoka | Sophia Ananiadou | Jun’ichi Tsujii
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Clustering Related Terms with Definitions
Scott Piao | John McNaught | Sophia Ananiadou
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

It is a challenging task to match similar or related terms/expressions in NLP and Text Mining applications. Two typical areas in need for such work are terminology and ontology constructions, where terms and concepts are extracted and organized into certain structures with various semantic relations. In the EU BOOTSTrep Project we test various techniques for matching terms that can assist human domain experts in building and enriching ontologies. This paper reports on a work in which we evaluated a text comparing and clustering tool for this task. Particularly, we explore the feasibility of matching related terms with their definitions. Ontology terms, such as Gene Ontology terms, are often assigned with detailed definitions, which provide a fundamental information source for detecting relations between terms. Here we focus on the exploitation of term definitions for the term matching task. Our experiment shows that the tool is capable of grouping many related terms using their definitions.

pdf bib
Building a Bio-Event Annotated Corpus for the Acquisition of Semantic Frames from Biomedical Corpora
Paul Thompson | Philip Cotter | John McNaught | Sophia Ananiadou | Simonetta Montemagni | Andrea Trabucco | Giulia Venturi
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper reports on the design and construction of a bio-event annotated corpus which was developed with a specific view to the acquisition of semantic frames from biomedical corpora. We describe the adopted annotation scheme and the annotation process, which is supported by a dedicated annotation tool. The annotated corpus contains 677 abstracts of biomedical research articles.

pdf bib
Connecting Text Mining and Pathways using the PathText Resource
Rune Sætre | Brian Kemper | Kanae Oda | Naoaki Okazaki | Yukiko Matsuoka | Norihiro Kikuchi | Hiroaki Kitano | Yoshimasa Tsuruoka | Sophia Ananiadou | Jun’ichi Tsujii
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Many systems have been developed in the past few years to assist researchers in the discovery of knowledge published as English text, for example in the PubMed database. At the same time, higher level collective knowledge is often published using a graphical notation representing all the entities in a pathway and their interactions. We believe that these pathway visualizations could serve as an effective user interface for knowledge discovery if they can be linked to the text in publications. Since the graphical elements in a Pathway are of a very different nature than their corresponding descriptions in English text, we developed a prototype system called PathText. The goal of PathText is to serve as a bridge between these two different representations. In this paper, we first describe the overall architecture and the interfaces of the PathText system, and then provide some details about the core Text Mining components.

2007

pdf bib
An Annotation Type System for a Data-Driven NLP Pipeline
Udo Hahn | Ekaterina Buyko | Katrin Tomanek | Scott Piao | John McNaught | Yoshimasa Tsuruoka | Sophia Ananiadou
Proceedings of the Linguistic Annotation Workshop

pdf bib
Text Mining Techniques for Building a Biolexicon
Sophia Ananiadou
Proceedings of the Australasian Language Technology Workshop 2007

pdf bib
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions
Sophia Ananiadou
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2006

pdf bib
A Term Recognition Approach to Acronym Recognition
Naoaki Okazaki | Sophia Ananiadou
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Clustering acronyms in biomedical text for disambiguation
Naoaki Okazaki | Sophia Ananiadou
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Given the increasing number of neologisms in biomedicine (names of genes, diseases, molecules, etc.), the rate of acronyms used in literature also increases. Existing acronym dictionaries cannot keep up with the rate of new creations. Thus, discovering and disambiguating acronyms and their expanded forms are essential aspects of text mining and terminology management. We present a method for clustering long forms identified by an acronym recognition method. Applying the acronym recognition method to MEDLINE abstracts, we obtained a list of short/long forms. The recognized short/long forms were classified by abiologist to construct an evaluation set for clustering sets of similar long forms. We observed five types of term variation in the evaluation set and defined four similarity measures to gathers the similar longforms (i.e., orthographic, morphological, syntactic, lexico semantic variants, nested abbreviations). The complete-link clustering with the four similarity measures achieved 87.5% precision and 84.9% recall on the evaluation set.

pdf bib
Towards a terminological resource for biomedical text mining
Goran Nenadic | Naoki Okazaki | Sophia Ananiadou
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

One of the main challenges in biomedical text mining is the identification of terminology, which is a key factor for accessing and integrating the information stored in literature. Manual creation of biomedical terminologies cannot keep pace with the data that becomes available. Still, many of them have been used in attempts to recognise terms in literature, but their suitability for text mining has been questioned as substantial re-engineering is needed to tailor the resources for automatic processing. Several approaches have been suggested to automatically integrate and map between resources, but the problems of extensive variability of lexical representations and ambiguity have been revealed. In this paper we present a methodology to automatically maintain a biomedical terminological database, which contains automatically extracted terms, their mutual relationships, features and possible annotations that can be useful in text processing. In addition to TermDB, a database used for terminology management and storage, we present the following modules that are used to populate the database: TerMine (recognition, extraction and normalisation of terms from literature), AcroTerMine (extraction and clustering of acronyms and their long forms), AnnoTerm (annotation and classification of terms), and ClusTerm (extraction of term associations and clustering of terms).

2005

pdf bib
A Machine Learning Approach to Acronym Generation
Yoshimasa Tsuruoka | Sophia Ananiadou | Jun’ichi Tsujii
Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics

2004

pdf bib
Design and Implementation of a Terminology-based Literature Mining and Knowledge Structuring System
Hideki Mima | Sophia Ananiadou | Katsumori Matsushima
Proceedings of CompuTerm 2004: 3rd International Workshop on Computational Terminology

pdf bib
Enhancing automatic term recognition through recognition of variation
Goran Nenadic | Sophia Ananiadou | John McNaught
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
Using Domain-Specific Verbs for Term Classification
Irena Spasic | Goran Nenadic | Sophia Ananiadou
Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine

pdf bib
Selecting Text Features for Gene Name Classification: from Documents to Terms
Goran Nenadic | Simon Rice | Irena Spasic | Sophia Ananiadou | Benjamin Stapley
Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine

pdf bib
Morpho-syntactic Clues for Terminological Processing in Serbian
Goran Nenadić | Irena Spasić | Sophia Ananiadou
Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages

pdf bib
An Integrated Term-Based Corpus Query System
Irena Spasic | Goran Nenadic | Kostas Manios | Sophia Ananiadou
10th Conference of the European Chapter of the Association for Computational Linguistics

2002

pdf bib
Tuning Context Features with Genetic Algorithms
Irena Spasić | Goran Nenadić | Sophia Ananiadou
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Automatic Acronym Acquisition and Term Variation Management within Domain-Specific Texts
Goran Nenadić | Irena Spasić | Sophia Ananiadou
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Automatic Discovery of Term Similarities Using Pattern Mining
Goran Nenadić | Irena Spasić | Sophia Ananiadou
COLING-02: COMPUTERM 2002: Second International Workshop on Computational Terminology

pdf bib
A Methodology for Terminology-based Knowledge Acquisition and Integration
Hideki Mima | Sophia Ananiadou | Goran Nenadic | Jun-Ichi Tsujii
COLING 2002: The 19th International Conference on Computational Linguistics

2000

pdf bib
Creating and Using Domain-specific Ontologies for Terminological Applications
Diana Maynard | Sophia Ananiadou
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Identifying Terms by their Family and Friends
Diana Maynard | Sophia Ananiadou
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

1998

pdf bib
Machine Translation Trends in Europe and Japan
Sophia Ananiadou
Proceedings of Translating and the Computer 20

1996

pdf bib
Extracting Nested Collocations
Katerina T. Frantzi | Sophia Ananiadou
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

1994

pdf bib
Terms are not alone: term choice and choice terms
Sophia Ananiadou
Proceedings of Translating and the Computer 16

pdf bib
A Methodology for Automatic Term Recognition
Sophia Ananiadou
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics

Search
Co-authors