Sam Skjonsberg


2021

pdf bib
PAWLS: PDF Annotation With Labels and Structure
Mark Neumann | Zejiang Shen | Sam Skjonsberg
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

Adobe’s Portable Document Format (PDF) is a popular way of distributing view-only documents with a rich visual markup. This presents a challenge to NLP practitioners who wish to use the information contained within PDF documents for training models or data analysis, because annotating these documents is difficult. In this paper, we present PDF Annotation with Labels and Structure (PAWLS), a new annotation tool designed specifically for the PDF document format. PAWLS is particularly suited for mixed-mode annotation and scenarios in which annotators require extended context to annotate accurately. PAWLS supports span-based textual annotation, N-ary relations and freeform, non-textual bounding boxes, all of which can be exported in convenient formats for training multi-modal machine learning models. A read-only PAWLS server is available at https://pawls.apps.allenai.org/, and the source code is available at https://github.com/allenai/pawls.

pdf bib
Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text
Christopher Clark | Jordi Salvador | Dustin Schwenk | Derrick Bonafilia | Mark Yatskar | Eric Kolve | Alvaro Herrasti | Jonghyun Choi | Sachin Mehta | Sam Skjonsberg | Carissa Schoenick | Aaron Sarnat | Hannaneh Hajishirzi | Aniruddha Kembhavi | Oren Etzioni | Ali Farhadi
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Communicating with humans is challenging for AIs because it requires a shared understanding of the world, complex semantics (e.g., metaphors or analogies), and at times multi-modal gestures (e.g., pointing with a finger, or an arrow in a diagram). We investigate these challenges in the context of Iconary, a collaborative game of drawing and guessing based on Pictionary, that poses a novel challenge for the research community. In Iconary, a Guesser tries to identify a phrase that a Drawer is drawing by composing icons, and the Drawer iteratively revises the drawing to help the Guesser in response. This back-and-forth often uses canonical scenes, visual metaphor, or icon compositions to express challenging words, making it an ideal test for mixing language and visual/symbolic communication in AI. We propose models to play Iconary and train them on over 55,000 games between human players. Our models are skillful players and are able to employ world knowledge in language models to play with words unseen during training.

2020

pdf bib
SUPP.AI: finding evidence for supplement-drug interactions
Lucy Lu Wang | Oyvind Tafjord | Arman Cohan | Sarthak Jain | Sam Skjonsberg | Carissa Schoenick | Nick Botner | Waleed Ammar
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Dietary supplements are used by a large portion of the population, but information on their pharmacologic interactions is incomplete. To address this challenge, we present SUPP.AI, an application for browsing evidence of supplement-drug interactions (SDIs) extracted from the biomedical literature. We train a model to automatically extract supplement information and identify such interactions from the scientific literature. To address the lack of labeled data for SDI identification, we use labels of the closely related task of identifying drug-drug interactions (DDIs) for supervision. We fine-tune the contextualized word representations of the RoBERTa language model using labeled DDI data, and apply the fine-tuned model to identify supplement interactions. We extract 195k evidence sentences from 22M articles (P=0.82, R=0.58, F1=0.68) for 60k interactions. We create the SUPP.AI application for users to search evidence sentences extracted by our model. SUPP.AI is an attempt to close the information gap on dietary supplements by making up-to-date evidence on SDIs more discoverable for researchers, clinicians, and consumers. An informational video on how to use SUPP.AI is available at: https://youtu.be/dR0ucKdORwc

2018

pdf bib
Construction of the Literature Graph in Semantic Scholar
Waleed Ammar | Dirk Groeneveld | Chandra Bhagavatula | Iz Beltagy | Miles Crawford | Doug Downey | Jason Dunkelberger | Ahmed Elgohary | Sergey Feldman | Vu Ha | Rodney Kinney | Sebastian Kohlmeier | Kyle Lo | Tyler Murray | Hsu-Han Ooi | Matthew Peters | Joanna Power | Sam Skjonsberg | Lucy Lu Wang | Chris Wilhelm | Zheng Yuan | Madeleine van Zuylen | Oren Etzioni
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph consists of more than 280M nodes, representing papers, authors, entities and various interactions between them (e.g., authorships, citations, entity mentions). We reduce literature graph construction into familiar NLP tasks (e.g., entity extraction and linking), point out research challenges due to differences from standard formulations of these tasks, and report empirical results for each task. The methods described in this paper are used to enable semantic features in www.semanticscholar.org.