Kristen Howell


2023

pdf bib
The economic trade-offs of large language models: A case study
Kristen Howell | Gwen Christian | Pavel Fomitchov | Gitit Kehat | Julianne Marzulla | Leanne Rolston | Jadin Tredup | Ilana Zimmerman | Ethan Selfridge | Joseph Bradley
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)

Contacting customer service via chat is a common practice. Because employing customer service agents is expensive, many companies are turning to NLP that assists human agents by auto-generating responses that can be used directly or with modifications. With their ability to handle large context windows, Large Language Models (LLMs) are a natural fit for this use case. However, their efficacy must be balanced with the cost of training and serving them. This paper assesses the practical cost and impact of LLMs for the enterprise as a function of the usefulness of the responses that they generate. We present a cost framework for evaluating an NLP model’s utility for this use case and apply it to a single brand as a case study in the context of an existing agent assistance product. We compare three strategies for specializing an LLM — prompt engineering, fine-tuning, and knowledge distillation — using feedback from the brand’s customer service agents. We find that the usability of a model’s responses can make up for a large difference in inference cost for our case study brand, and we extrapolate our findings to the broader enterprise space.

2022

pdf bib
Behind the Mask: Demographic bias in name detection for PII masking
Courtney Mansfield | Amandalynne Paullada | Kristen Howell
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Many datasets contain personally identifiable information, or PII, which poses privacy risks to individuals. PII masking is commonly used to redact personal information such as names, addresses, and phone numbers from text data. Most modern PII masking pipelines involve machine learning algorithms. However, these systems may vary in performance, such that individuals from particular demographic groups bear a higher risk for having their personal information exposed. In this paper, we evaluate the performance of three off-the-shelf PII masking systems on name detection and redaction. We generate data using names and templates from the customer service domain. We find that an open-source RoBERTa-based system shows fewer disparities than the commercial models we test. However, all systems demonstrate significant differences in error rate based on demographics. In particular, the highest error rates occurred for names associated with Black and Asian/Pacific Islander individuals.

pdf bib
Building Analyses from Syntactic Inference in Local Languages: An HPSG Grammar Inference System
Kristen Howell | Emily M. Bender
Northern European Journal of Language Technology, Volume 8

We present a grammar inference system that leverages linguistic knowledge recorded in the form of annotations in interlinear glossed text (IGT) and in a meta-grammar engineering system (the LinGO Grammar Matrix customization system) to automatically produce machine-readable HPSG grammars. Building on prior work to handle the inference of lexical classes, stems, affixes and position classes, and preliminary work on inferring case systems and word order, we introduce an integrated grammar inference system that covers a wide range of fundamental linguistic phenomena. System development was guided by 27 geneologically and geographically diverse languages, and we test the system’s cross-linguistic generalizability on an additional 5 held-out languages, using datasets provided by field linguists. Our system out-performs three baseline systems in increasing coverage while limiting ambiguity and producing richer semantic representations, while also producing richer representations than previous work in grammar inference.

pdf bib
Domain-specific knowledge distillation yields smaller and better models for conversational commerce
Kristen Howell | Jian Wang | Akshay Hazare | Joseph Bradley | Chris Brew | Xi Chen | Matthew Dunn | Beth Hockey | Andrew Maurer | Dominic Widdows
Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5)

We demonstrate that knowledge distillation can be used not only to reduce model size, but to simultaneously adapt a contextual language model to a specific domain. We use Multilingual BERT (mBERT; Devlin et al., 2019) as a starting point and follow the knowledge distillation approach of (Sahn et al., 2019) to train a smaller multilingual BERT model that is adapted to the domain at hand. We show that for in-domain tasks, the domain-specific model shows on average 2.3% improvement in F1 score, relative to a model distilled on domain-general data. Whereas much previous work with BERT has fine-tuned the encoder weights during task training, we show that the model improvements from distillation on in-domain data persist even when the encoder weights are frozen during task training, allowing a single encoder to support classifiers for multiple tasks and languages.

2021

pdf bib
Should Semantic Vector Composition be Explicit? Can it be Linear?
Dominic Widdows | Kristen Howell | Trevor Cohen
Proceedings of the 2021 Workshop on Semantic Spaces at the Intersection of NLP, Physics, and Cognitive Science (SemSpace)

Vector representations have become a central element in semantic language modelling, leading to mathematical overlaps with many fields including quantum theory. Compositionality is a core goal for such representations: given representations for ‘wet’ and ‘fish’, how should the concept ‘wet fish’ be represented? This position paper surveys this question from two points of view. The first considers the question of whether an explicit mathematical representation can be successful using only tools from within linear algebra, or whether other mathematical tools are needed. The second considers whether semantic vector composition should be explicitly described mathematically, or whether it can be a model-internal side-effect of training a neural network. A third and newer question is whether a compositional model can be implemented on a quantum computer. Given the fundamentally linear nature of quantum mechanics, we propose that these questions are related, and that this survey may help to highlight candidate operations for future quantum implementation.

2019

pdf bib
Modeling Clausal Complementation for a Grammar Engineering Resource
Olga Zamaraeva | Kristen Howell | Emily M. Bender
Proceedings of the Society for Computation in Linguistics (SCiL) 2019

pdf bib
Handling cross-cutting properties in automatic inference of lexical classes: A case study of Chintang
Olga Zamaraeva | Kristen Howell | Emily M. Bender
Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)

2018

pdf bib
Clausal Modifiers in the Grammar Matrix
Kristen Howell | Olga Zamaraeva
Proceedings of the 27th International Conference on Computational Linguistics

We extend the coverage of an existing grammar customization system to clausal modifiers, also referred to as adverbial clauses. We present an analysis, taking a typologically-driven approach to account for this phenomenon across the world’s languages, which we implement in the Grammar Matrix customization system (Bender et al., 2002, 2010). Testing our analysis on testsuites from five genetically and geographically diverse languages that were not considered in development, we achieve 88.4% coverage and 1.5% overgeneration.

pdf bib
Improving Feature Extraction for Pathology Reports with Precise Negation Scope Detection
Olga Zamaraeva | Kristen Howell | Adam Rhine
Proceedings of the 27th International Conference on Computational Linguistics

We use a broad coverage, linguistically precise English Resource Grammar (ERG) to detect negation scope in sentences taken from pathology reports. We show that incorporating this information in feature extraction has a positive effect on classification of the reports with respect to cancer laterality compared with NegEx, a commonly used tool for negation detection. We analyze the differences between NegEx and ERG results on our dataset and how these differences indicate some directions for future work.

2017

pdf bib
STREAMLInED Challenges: Aligning Research Interests with Shared Tasks
Gina-Anne Levow | Emily M. Bender | Patrick Littell | Kristen Howell | Shobhana Chelliah | Joshua Crowgey | Dan Garrette | Jeff Good | Sharon Hargus | David Inman | Michael Maxwell | Michael Tjalve | Fei Xia
Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages

pdf bib
Inferring Case Systems from IGT: Enriching the Enrichment
Kristen Howell | Emily M. Bender | Michel Lockwood | Fei Xia | Olga Zamaraeva
Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages

pdf bib
Computational Support for Finding Word Classes: A Case Study of Abui
Olga Zamaraeva | František Kratochvíl | Emily M. Bender | Fei Xia | Kristen Howell
Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages