William A. Baumgartner, Jr.

Also published as: William A. Baumgartner, William A. Baumgartner Jr., William Baumgartner, William Baumgartner Jr.


pdf bib
CRAFT Shared Tasks 2019 Overview — Integrated Structure, Semantics, and Coreference
William Baumgartner | Michael Bada | Sampo Pyysalo | Manuel R. Ciosici | Negacy Hailu | Harrison Pielke-Lombardo | Michael Regan | Lawrence Hunter
Proceedings of the 5th Workshop on BioNLP Open Shared Tasks

As part of the BioNLP Open Shared Tasks 2019, the CRAFT Shared Tasks 2019 provides a platform to gauge the state of the art for three fundamental language processing tasks — dependency parse construction, coreference resolution, and ontology concept identification — over full-text biomedical articles. The structural annotation task requires the automatic generation of dependency parses for each sentence of an article given only the article text. The coreference resolution task focuses on linking coreferring base noun phrase mentions into chains using the symmetrical and transitive identity relation. The ontology concept annotation task involves the identification of concept mentions within text using the classes of ten distinct ontologies in the biomedical domain, both unmodified and augmented with extension classes. This paper provides an overview of each task, including descriptions of the data provided to participants and the evaluation metrics used, and discusses participant results relative to baseline performances for each of the three tasks.


pdf bib
SuperCAT: The (New and Improved) Corpus Analysis Toolkit
K. Bretonnel Cohen | William A. Baumgartner Jr. | Irina Temnikova
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper reports SuperCAT, a corpus analysis toolkit. It is a radical extension of SubCAT, the Sublanguage Corpus Analysis Toolkit, from sublanguage analysis to corpus analysis in general. The idea behind SuperCAT is that representative corpora have no tendency towards closure―that is, they tend towards infinity. In contrast, non-representative corpora have a tendency towards closure―roughly, finiteness. SuperCAT focuses on general techniques for the quantitative description of the characteristics of any corpus (or other language sample), particularly concerning the characteristics of lexical distributions. Additionally, SuperCAT features a complete re-engineering of the previous SubCAT architecture.


pdf bib
Sublanguage Corpus Analysis Toolkit: A tool for assessing the representativeness and sublanguage characteristics of corpora
Irina Temnikova | William A. Baumgartner Jr. | Negacy D. Hailu | Ivelina Nikolova | Tony McEnery | Adam Kilgarriff | Galia Angelova | K. Bretonnel Cohen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Sublanguages are varieties of language that form “subsets” of the general language, typically exhibiting particular types of lexical, semantic, and other restrictions and deviance. SubCAT, the Sublanguage Corpus Analysis Toolkit, assesses the representativeness and closure properties of corpora to analyze the extent to which they are either sublanguages, or representative samples of the general language. The current version of SubCAT contains scripts and applications for assessing lexical closure, morphological closure, sentence type closure, over-represented words, and syntactic deviance. Its operation is illustrated with three case studies concerning scientific journal articles, patents, and clinical records. Materials from two language families are analyzed―English (Germanic), and Bulgarian (Slavic). The software is available at sublanguage.sourceforge.net under a liberal Open Source license.


pdf bib
Closure Properties of Bulgarian Clinical Text
Irina Temnikova | Ivelina Nikolova | William A. Baumgartner | Galia Angelova | K. Bretonnel Cohen
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013


pdf bib
Fast and simple semantic class assignment for biomedical text
K. Bretonnel Cohen | Thomas Christiansen | William Baumgartner Jr. | Karin Verspoor | Lawrence Hunter
Proceedings of BioNLP 2011 Workshop


pdf bib
Test Suite Design for Biomedical Ontology Concept Recognition Systems
K. Bretonnel Cohen | Christophe Roeder | William A. Baumgartner Jr. | Lawrence E. Hunter | Karin Verspoor
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Systems that locate mentions of concepts from ontologies in free text are known as ontology concept recognition systems. This paper describes an approach to the evaluation of the workings of ontology concept recognition systems through use of a structured test suite and presents a publicly available test suite for this purpose. It is built using the principles of descriptive linguistic fieldwork and of software testing. More broadly, we also seek to investigate what general principles might inform the construction of such test suites. The test suite was found to be effective in identifying performance errors in an ontology concept recognition system. The system could not recognize 2.1% of all canonical forms and no non-canonical forms at all. Regarding the question of general principles of test suite construction, we compared this test suite to a named entity recognition test suite constructor. We found that they had twenty features in total and that seven were shared between the two models, suggesting that there is a core of feature types that may be applicable to test suite construction for any similar type of application.


pdf bib
High-precision biological event extraction with a concept recognizer
K. Bretonnel Cohen | Karin Verspoor | Helen Johnson | Chris Roeder | Philip Ogren | William Baumgartner | Elizabeth White | Lawrence Hunter
Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task


pdf bib
Software Testing and the Naturally Occurring Data Assumption in Natural Language Processing
K. Bretonnel Cohen | William A. Baumgartner Jr. | Lawrence Hunter
Software Engineering, Testing, and Quality Assurance for Natural Language Processing


pdf bib
Refactoring Corpora
Helen L. Johnson | William A. Baumgartner Jr. | Martin Krallinger | K. Bretonnel Cohen | Lawrence Hunter
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology