Eduardo Mendoza
2023
A Hybrid of Rule-based and Transformer-based Approaches for Relation Extraction in Biodiversity Literature
Roselyn Gabud
|
Portia Lapitan
|
Vladimir Mariano
|
Eduardo Mendoza
|
Nelson Pampolina
|
Maria Art Antonette Clariño
|
Riza Batista-Navarro
Proceedings of the 2nd Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning
Relation extraction (RE) is one of the tasks behind many relevant natural language processing (NLP) applications. Exploiting the information hidden in millions of scholarly articles by leveraging NLP, specifically RE, systems could benefit studies in specialized domains, e.g. biomedicine and biodiversity. Although deep learning (DL)-based methods have shown state-of-the-art performance in many NLP tasks including RE, DL for domain-specific RE systems has been hindered by the lack of expert-labeled datasets which are typically required to train such methods. In this paper, we take advantage of the zero-shot (i.e., not requiring any labeled data) capability of pattern-based methods for RE using a rule-based approach, combined with templates for natural language inference (NLI) transformer models. We present our hybrid method for RE that exploits the advantages of both methods, i.e., interpretability of rules and transferability of transformers. Evaluated on a corpus of biodiversity literature with annotated relations, our hybrid method demonstrated an improvement of up to 15 percentage points in recall and best performance over solely rule-based and transformer-based methods with F1-scores ranging from 89.61% to 96.75% for reproductive condition - temporal expression relations, and ranging from 85.39% to 89.90% for habitat - geographic location relations.
Search