Craig Harman


2021

pdf bib
LOME: Large Ontology Multilingual Extraction
Patrick Xia | Guanghui Qin | Siddharth Vashishtha | Yunmo Chen | Tongfei Chen | Chandler May | Craig Harman | Kyle Rawlins | Aaron Steven White | Benjamin Van Durme
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

We present LOME, a system for performing multilingual information extraction. Given a text document as input, our core system identifies spans of textual entity and event mentions with a FrameNet (Baker et al., 1998) parser. It subsequently performs coreference resolution, fine-grained entity typing, and temporal relation prediction between events. By doing so, the system constructs an event and entity focused knowledge graph. We can further apply third-party modules for other types of annotation, like relation extraction. Our (multilingual) first-party modules either outperform or are competitive with the (monolingual) state-of-the-art. We achieve this through the use of multilingual encoders like XLM-R (Conneau et al., 2020) and leveraging multilingual training data. LOME is available as a Docker container on Docker Hub. In addition, a lightweight version of the system is accessible as a web demo.

pdf bib
Everything Is All It Takes: A Multipronged Strategy for Zero-Shot Cross-Lingual Information Extraction
Mahsa Yarmohammadi | Shijie Wu | Marc Marone | Haoran Xu | Seth Ebner | Guanghui Qin | Yunmo Chen | Jialiang Guo | Craig Harman | Kenton Murray | Aaron Steven White | Mark Dredze | Benjamin Van Durme
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Zero-shot cross-lingual information extraction (IE) describes the construction of an IE model for some target language, given existing annotations exclusively in some other language, typically English. While the advance of pretrained multilingual encoders suggests an easy optimism of “train on English, run on any language”, we find through a thorough exploration and extension of techniques that a combination of approaches, both new and old, leads to better performance than any one cross-lingual strategy in particular. We explore techniques including data projection and self-training, and how different pretrained encoders impact them. We use English-to-Arabic IE as our initial example, demonstrating strong performance in this setting for event extraction, named entity recognition, part-of-speech tagging, and dependency parsing. We then apply data projection and self-training to three tasks across eight target languages. Because no single set of techniques performs the best across all tasks, we encourage practitioners to explore various configurations of the techniques described in this work when seeking to improve on zero-shot training.

2017

pdf bib
CADET: Computer Assisted Discovery Extraction and Translation
Benjamin Van Durme | Tom Lippincott | Kevin Duh | Deana Burchfield | Adam Poliak | Cash Costello | Tim Finin | Scott Miller | James Mayfield | Philipp Koehn | Craig Harman | Dawn Lawrie | Chandler May | Max Thomas | Annabelle Carrell | Julianne Chaloux | Tongfei Chen | Alex Comerford | Mark Dredze | Benjamin Glass | Shudong Hao | Patrick Martin | Pushpendre Rastogi | Rashmi Sankepally | Travis Wolfe | Ying-Ying Tran | Ted Zhang
Proceedings of the IJCNLP 2017, System Demonstrations

Computer Assisted Discovery Extraction and Translation (CADET) is a workbench for helping knowledge workers find, label, and translate documents of interest. It combines a multitude of analytics together with a flexible environment for customizing the workflow for different users. This open-source framework allows for easy development of new research prototypes using a micro-service architecture based atop Docker and Apache Thrift.

2015

pdf bib
A Concrete Chinese NLP Pipeline
Nanyun Peng | Francis Ferraro | Mo Yu | Nicholas Andrews | Jay DeYoung | Max Thomas | Matthew R. Gormley | Travis Wolfe | Craig Harman | Benjamin Van Durme | Mark Dredze
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf bib
Semantic Proto-Roles
Drew Reisinger | Rachel Rudinger | Francis Ferraro | Craig Harman | Kyle Rawlins | Benjamin Van Durme
Transactions of the Association for Computational Linguistics, Volume 3

We present the first large-scale, corpus based verification of Dowty’s seminal theory of proto-roles. Our results demonstrate both the need for and the feasibility of a property-based annotation scheme of semantic relationships, as opposed to the currently dominant notion of categorical roles.

pdf bib
From ADHD to SAD: Analyzing the Language of Mental Health on Twitter through Self-Reported Diagnoses
Glen Coppersmith | Mark Dredze | Craig Harman | Kristy Hollingshead
Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality

pdf bib
CLPsych 2015 Shared Task: Depression and PTSD on Twitter
Glen Coppersmith | Mark Dredze | Craig Harman | Kristy Hollingshead | Margaret Mitchell
Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality

2014

pdf bib
I’m a Belieber: Social Roles via Self-identification and Conceptual Attributes
Charley Beller | Rebecca Knowles | Craig Harman | Shane Bergsma | Margaret Mitchell | Benjamin Van Durme
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Predicting Fine-grained Social Roles with Selectional Preferences
Charley Beller | Craig Harman | Benjamin Van Durme
Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science

pdf bib
Quantifying Mental Health Signals in Twitter
Glen Coppersmith | Mark Dredze | Craig Harman
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality