2022
pdf
bib
abs
SynKB: Semantic Search for Synthetic Procedures
Fan Bai
|
Alan Ritter
|
Peter Madrid
|
Dayne Freitag
|
John Niekrasz
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
In this paper we present SynKB, an open-source, automatically extracted knowledge base of chemical synthesis protocols. Similar to proprietary chemistry databases such as Reaxsys, SynKB allows chemists to retrieve structured knowledge about synthetic procedures. By taking advantage of recent advances in natural language processing for procedural texts, SynKB supports more flexible queries about reaction conditions, and thus has the potential to help chemists search the literature for conditions used in relevant reactions as they design new synthetic routes. Using customized Transformer models to automatically extract information from 6 million synthesis procedures described in U.S. and EU patents, we show that for many queries, SynKB has higher recall than Reaxsys, while maintaining high precision. We plan to make SynKB available as an open-source tool; in contrast, proprietary chemistry databases require costly subscriptions.
pdf
bib
abs
Accelerating Human Authorship of Information Extraction Rules
Dayne Freitag
|
John Cadigan
|
John Niekrasz
|
Robert Sasseen
Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning
We consider whether machine models can facilitate the human development of rule sets for information extraction. Arguing that rule-based methods possess a speed advantage in the early development of new extraction capabilities, we ask whether this advantage can be increased further through the machine facilitation of common recurring manual operations in the creation of an extraction rule set from scratch. Using a historical rule set, we reconstruct and describe the putative manual operations required to create it. In experiments targeting one key operation—the enumeration of words occurring in particular contexts—we simulate the process or corpus review and word list creation, showing that several simple interventions greatly improve recall as a function of simulated labor.
2016
pdf
bib
Feature Derivation for Exploitation of Distant Annotation via Pattern Induction against Dependency Parses
Dayne Freitag
|
John Niekrasz
Proceedings of the 15th Workshop on Biomedical Natural Language Processing
pdf
bib
abs
An Annotated Corpus and Method for Analysis of Ad-Hoc Structures Embedded in Text
Eric Yeh
|
John Niekrasz
|
Dayne Freitag
|
Richard Rohwer
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
We describe a method for identifying and performing functional analysis of structured regions that are embedded in natural language documents, such as tables or key-value lists. Such regions often encode information according to ad hoc schemas and avail themselves of visual cues in place of natural language grammar, presenting problems for standard information extraction algorithms. Unlike previous work in table extraction, which assumes a relatively noiseless two-dimensional layout, our aim is to accommodate a wide variety of naturally occurring structure types. Our approach has three main parts. First, we collect and annotate a a diverse sample of “naturally” occurring structures from several sources. Second, we use probabilistic text segmentation techniques, featurized by skip bigrams over spatial and token category cues, to automatically identify contiguous regions of structured text that share a common schema. Finally, we identify the records and fields within each structured region using a combination of distributional similarity and sequence alignment methods, guided by minimal supervision in the form of a single annotated record. We evaluate the last two components individually, and conclude with a discussion of further work.
2010
pdf
bib
Annotating Participant Reference in English Spoken Conversation
John Niekrasz
|
Johanna D. Moore
Proceedings of the Fourth Linguistic Annotation Workshop
2009
pdf
bib
Participant Subjectivity and Involvement as a Basis for Discourse Segmentation
John Niekrasz
|
Johanna Moore
Proceedings of the SIGDIAL 2009 Conference
2007
pdf
bib
Detecting and Summarizing Action Items in Multi-Party Dialogue
Matthew Purver
|
John Dowding
|
John Niekrasz
|
Patrick Ehlen
|
Sharareh Noorbaloochi
|
Stanley Peters
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue
pdf
bib
Resolving “You” in Multi-Party Dialog
Surabhi Gupta
|
John Niekrasz
|
Matthew Purver
|
Dan Jurafsky
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue
2006
pdf
bib
abs
NOMOS: A Semantic Web Software Framework for Annotation of Multimodal Corpora
John Niekrasz
|
Alexander Gruenstein
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
We present NOMOS, an open-source software framework for annotation, processing, and analysis of multimodal corpora. NOMOS is designed for use by annotators, corpus developers, and corpus consumers, emphasizing configurability for a variety of specific annotation tasks. Its features include synchronized multi-channel audio and video playback, compatibility with several corpora, platform independence, and mixed display of capabilities and a well-defined method for layering datasets. Second, we describe how the system is used. For corpus development and annotation we present a typical use scenario involving the creation of a schema and specialization of the user interface. For processing and analysis we describe the GUI- and Java-based methods available, including a GUI for query construction and execution, and an automatically generated schema-conforming Java API for processing of annotations. Additionally, we present some specific annotation and research tasks for which NOMOS has been specialized and used, annotation and research tasks for which NOMOS has been specialized and used, including topic segmentation and decision-point annotation of meetings.
pdf
bib
Shallow Discourse Structure for Action Item Detection
Matthew Purver
|
Patrick Ehlen
|
John Niekrasz
Proceedings of the Analyzing Conversations in Text and Speech
2005
pdf
bib
Meeting Structure Annotation: Data and Tools
Alexander Gruenstein
|
John Niekrasz
|
Matthew Purver
Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue
2004
pdf
bib
Multi-Human Dialogue Understanding for Assisting Artifact-Producing Meetings
John Niekrasz
|
Alexander Gruenstein
|
Lawrence Cavedon
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics