Elizabeth Soper


2022

pdf bib
When Polysemy Matters: Modeling Semantic Categorization with Word Embeddings
Elizabeth Soper | Jean-pierre Koenig
Proceedings of the 11th Joint Conference on Lexical and Computational Semantics

Recent work using word embeddings to model semantic categorization have indicated that static models outperform the more recent contextual class of models (Majewska et al, 2021). In this paper, we consider polysemy as a possible confounding factor, comparing sense-level embeddings with previously studied static embeddings on both coarse- and fine-grained categorization tasks. We find that the effect of polysemy depends on how one defines semantic categorization; while sense-level embeddings dramatically outperform static embeddings in predicting coarse-grained categories derived from a word sorting task, they perform approximately equally in predicting fine-grained categories derived from context-free similarity judgments. Our findings highlight the different processes underlying human behavior on different types of semantic tasks.

pdf bib
Let’s Chat: Understanding User Expectations in Socialbot Interactions
Elizabeth Soper | Erin Pacquetet | Sougata Saha | Souvik Das | Rohini Srihari
Proceedings of the Second Workshop on Bridging Human--Computer Interaction and Natural Language Processing

This paper analyzes data from the 2021 Amazon Alexa Prize Socialbot Grand Challenge 4, in order to better understand the differences between human-computer interactions (HCI) in a socialbot setting and conventional human-to-human interactions. We find that because socialbots are a new genre of HCI, we are still negotiating norms to guide interactions in this setting. We present several notable patterns in user behavior toward socialbots, which have important implications for guiding future work in the development of conversational agents.

2021

pdf bib
BART for Post-Correction of OCR Newspaper Text
Elizabeth Soper | Stanley Fujimoto | Yen-Yun Yu
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Optical character recognition (OCR) from newspaper page images is susceptible to noise due to degradation of old documents and variation in typesetting. In this report, we present a novel approach to OCR post-correction. We cast error correction as a translation task, and fine-tune BART, a transformer-based sequence-to-sequence language model pretrained to denoise corrupted text. We are the first to use sentence-level transformer models for OCR post-correction, and our best model achieves a 29.4% improvement in character accuracy over the original noisy OCR text. Our results demonstrate the utility of pretrained language models for dealing with noisy text.