PropBank goes Public: Incorporation into Wikidata
Elizabeth Spaulding | Kathryn Conger | Anatole Gershman | Mahir Morshed | Susan Windisch Brown | James Pustejovsky | Rosario Uceda-Sosa | Sijia Ge | Martha Palmer
Proceedings of The 18th Linguistic Annotation Workshop (LAW-XVIII)

This paper presents the first integration of PropBank role information into Wikidata, in order to provide a novel resource for information extraction, one combining Wikidata’s ontological metadata with PropBank’s rich argument structure encoding for event classes. We discuss a technique for PropBank augmentation to existing eventive Wikidata items, as well as identification of gaps in Wikidata’s coverage based on manual examination of over 11,300 PropBank rolesets. We propose five new Wikidata properties to integrate PropBank structure into Wikidata so that the annotated mappings can be added en masse. We then outline the methodology and challenges of this integration, including annotation with the combined resources.


Joint End-to-end Semantic Proto-role Labeling
Elizabeth Spaulding | Gary Kazantsev | Mark Dredze
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Semantic proto-role labeling (SPRL) assigns properties to arguments based on a series of binary labels. While multiple studies have evaluated various approaches to SPRL, it has only been studied in-depth as a standalone task using gold predicate/argument pairs. How do SPRL systems perform as part of an information extraction pipeline? We model SPRL jointly with predicate-argument extraction using a deep transformer model. We find that proto-role labeling is surprisingly robust in this setting, with only a small decrease when using predicted arguments. We include a detailed analysis of each component of the joint system, and an error analysis to understand correlations in errors between system stages. Finally, we study the effects of annotation errors on SPRL.

The DARPA Wikidata Overlay: Wikidata as an ontology for natural language processing
Elizabeth Spaulding | Kathryn Conger | Anatole Gershman | Rosario Uceda-Sosa | Susan Windisch Brown | James Pustejovsky | Peter Anick | Martha Palmer
Proceedings of the 19th Joint ACL-ISO Workshop on Interoperable Semantics (ISA-19)

With 102,530,067 items currently in its crowd-sourced knowledge base, Wikidata provides NLP practitioners a unique and powerful resource for inference and reasoning over real-world entities. However, because Wikidata is very entity focused, events and actions are often labeled with eventive nouns (e.g., the process of diagnosing a person’s illness is labeled “diagnosis”), and the typical participants in an event are not described or linked to that event concept (e.g., the medical professional or patient). Motivated by a need for an adaptable, comprehensive, domain-flexible ontology for information extraction, including identifying the roles entities are playing in an event, we present a curated subset of Wikidata in which events have been enriched with PropBank roles. To enable richer narrative understanding between events from Wikidata concepts, we have also provided a comprehensive mapping from temporal Qnodes and Pnodes to the Allen Interval Temporal Logic relations.


Linguist vs. Machine: Rapid Development of Finite-State Morphological Grammars
Sarah Beemer | Zak Boston | April Bukoski | Daniel Chen | Princess Dickens | Andrew Gerlach | Torin Hopkins | Parth Anand Jawale | Chris Koski | Akanksha Malhotra | Piyush Mishra | Saliha Muradoglu | Lan Sang | Tyler Short | Sagarika Shreevastava | Elizabeth Spaulding | Testumichi Umada | Beilei Xiang | Changbing Yang | Mans Hulden
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

Sequence-to-sequence models have proven to be highly successful in learning morphological inflection from examples as the series of SIGMORPHON/CoNLL shared tasks have shown. It is usually assumed, however, that a linguist working with inflectional examples could in principle develop a gold standard-level morphological analyzer and generator that would surpass a trained neural network model in accuracy of predictions, but that it may require significant amounts of human labor. In this paper, we discuss an experiment where a group of people with some linguistic training develop 25+ grammars as part of the shared task and weigh the cost/benefit ratio of developing grammars by hand. We also present tools that can help linguists triage difficult complex morphophonological phenomena within a language and hypothesize inflectional class membership. We conclude that a significant development effort by trained linguists to analyze and model morphophonological patterns are required in order to surpass the accuracy of neural models.