Elliot Ford
2022
Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework
Wonjin Yoon
|
Richard Jackson
|
Elliot Ford
|
Vladimir Poroshin
|
Jaewoo Kang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
In order to assist the drug discovery/development process, pharmaceutical companies often apply biomedical NER and linking techniques over internal and public corpora. Decades of study of the field of BioNLP has produced a plethora of algorithms, systems and datasets. However, our experience has been that no single open source system meets all the requirements of a modern pharmaceutical company. In this work, we describe these requirements according to our experience of the industry, and present Kazu, a highly extensible, scalable open source framework designed to support BioNLP for the pharmaceutical sector. Kazu is a built around a computationally efficient version of the BERN2 NER model (TinyBERN2), and subsequently wraps several other BioNLP technologies into one coherent system.
2017
A Text Normalisation System for Non-Standard English Words
Emma Flint
|
Elliot Ford
|
Olivia Thomas
|
Andrew Caines
|
Paula Buttery
Proceedings of the 3rd Workshop on Noisy User-generated Text
This paper investigates the problem of text normalisation; specifically, the normalisation of non-standard words (NSWs) in English. Non-standard words can be defined as those word tokens which do not have a dictionary entry, and cannot be pronounced using the usual letter-to-phoneme conversion rules; e.g. lbs, 99.3%, #EMNLP2017. NSWs pose a challenge to the proper functioning of text-to-speech technology, and the solution is to spell them out in such a way that they can be pronounced appropriately. We describe our four-stage normalisation system made up of components for detection, classification, division and expansion of NSWs. Performance is favourabe compared to previous work in the field (Sproat et al. 2001, Normalization of non-standard words), as well as state-of-the-art text-to-speech software. Further, we update Sproat et al.’s NSW taxonomy, and create a more customisable system where users are able to input their own abbreviations and specify into which variety of English (currently available: British or American) they wish to normalise.
Search
Fix data
Co-authors
- Paula Buttery 1
- Andrew Caines 1
- Emma Flint 1
- Richard Jackson 1
- Jaewoo Kang 1
- show all...