Gene Kim


2024

pdf bib
Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models
Mahammed Kamruzzaman | Md. Shovon | Gene Kim
Findings of the Association for Computational Linguistics: ACL 2024

LLMs are increasingly powerful and widely used to assist users in a variety of tasks. This use risks introducing LLM biases into consequential decisions such as job hiring, human performance evaluation, and criminal sentencing. Bias in NLP systems along the lines of gender and ethnicity has been widely studied, especially for specific stereotypes (e.g., Asians are good at math). In this paper, we investigate bias along less-studied but still consequential, dimensions, such as age and beauty, measuring subtler correlated decisions that LLMs make between social groups and unrelated positive and negative attributes. Although these subtler biases are understudied they follow people as much as gender and ethnicity do. So, we want to see whether they also follow one with LLMs.We introduce a template-generated dataset of sentence completion tasks that asks the model to select the most appropriate attribute to complete an evaluative statement about a person described as a member of a specific social group. We also reverse the completion task to select the social group based on an attribute. We report the correlations that we find for 4 cutting-edge LLMs. This dataset can be used as a benchmark to evaluate progress in more generalized biases and the templating technique can be used to expand the benchmark with minimal additional human annotation.

2023

pdf bib
Efficient Sentiment Analysis: A Resource-Aware Evaluation of Feature Extraction Techniques, Ensembling, and Deep Learning Models
Mahammed Kamruzzaman | Gene Kim
Proceedings of the 11th International Workshop on Natural Language Processing for Social Media

pdf bib
BanMANI: A Dataset to Identify Manipulated Social Media News in Bangla
Mahammed Kamruzzaman | Md. Minul Islam Shovon | Gene Kim
Proceedings of the Workshop on Computational Terminology in NLP and Translation Studies (ConTeNTS) Incorporating the 16th Workshop on Building and Using Comparable Corpora (BUCC)

Initial work has been done to address fake news detection and misrepresentation of news in the Bengali language. However, no work in Bengali yet addresses the identification of specific claims in social media news that falsely manipulate a related news article. At this point, this problem has been tackled in English and a few other languages, but not in the Bengali language. In this paper, we curate a dataset of social media content labeled with information manipulation relative to reference articles, called BanMANI. The dataset collection method we describe works around the limitations of the available NLP tools in Bangla. We expect these techniques will carry over to building similar datasets in other low-resource languages. BanMANI forms the basis both for evaluating the capabilities of existing NLP systems and for training or fine-tuning new models specifically on this task. In our analysis, we find that this task challenges current LLMs both under zero-shot and fine-tuned set- things

pdf bib
Semantically Informed Data Augmentation for Unscoped Episodic Logical Forms
Mandar Juvekar | Gene Kim | Lenhart Schubert
Proceedings of the 15th International Conference on Computational Semantics

Unscoped Logical Form (ULF) of Episodic Logic is a meaning representation format that captures the overall semantic type structure of natural language while leaving certain finer details, such as word sense and quantifier scope, underspecified for ease of parsing and annotation. While a learned parser exists to convert English to ULF, its performance is severely limited by the lack of a large dataset to train the system. We present a ULF dataset augmentation method that samples type-coherent ULF expressions using the ULF semantic type system and filters out samples corresponding to implausible English sentences using a pretrained language model. Our data augmentation method is configurable with parameters that trade off between plausibility of samples with sample novelty and augmentation size. We find that the best configuration of this augmentation method substantially improves parser performance beyond using the existing unaugmented dataset.

2021

pdf bib
Monotonic Inference for Underspecified Episodic Logic
Gene Kim | Mandar Juvekar | Lenhart Schubert
Proceedings of the 1st and 2nd Workshops on Natural Logic Meets Machine Learning (NALOMA)

We present a method of making natural logic inferences from Unscoped Logical Form of Episodic Logic. We establish a correspondence between inference rules of scope resolved Episodic Logic and the natural logic treatment by Sánchez Valencia (1991a), and hence demonstrate the ability to handle foundational natural logic inferences from prior literature as well as more general nested monotonicity inferences.

pdf bib
A (Mostly) Symbolic System for Monotonic Inference with Unscoped Episodic Logical Forms
Gene Kim | Mandar Juvekar | Junis Ekmekciu | Viet Duong | Lenhart Schubert
Proceedings of the 1st and 2nd Workshops on Natural Logic Meets Machine Learning (NALOMA)

We implement the formalization of natural logic-like monotonic inference using Unscoped Episodic Logical Forms (ULFs) by Kim et al. (2020). We demonstrate this system’s capacity to handle a variety of challenging semantic phenomena using the FraCaS dataset (Cooper et al., 1996). These results give empirical evidence for prior claims that ULF is an appropriate representation to mediate natural logic-like inferences.

pdf bib
A Transition-based Parser for Unscoped Episodic Logical Forms
Gene Kim | Viet Duong | Xin Lu | Lenhart Schubert
Proceedings of the 14th International Conference on Computational Semantics (IWCS)

“Episodic Logic: Unscoped Logical Form” (EL-ULF) is a semantic representation capturing predicate-argument structure as well as more challenging aspects of language within the Episodic Logic formalism. We present the first learned approach for parsing sentences into ULFs, using a growing set of annotated examples. The results provide a strong baseline for future improvement. Our method learns a sequence-to-sequence model for predicting the transition action sequence within a modified cache transition system. We evaluate the efficacy of type grammar-based constraints, a word-to-symbol lexicon, and transition system state features in this task. Our system is available at https://github.com/genelkim/ulf-transition-parser. We also present the first official annotated ULF dataset at https://www.cs.rochester.edu/u/gkim21/ulf/resources/.

2019

pdf bib
Generating Discourse Inferences from Unscoped Episodic Logical Formulas
Gene Kim | Benjamin Kane | Viet Duong | Muskaan Mendiratta | Graeme McGuire | Sophie Sackstein | Georgiy Platonov | Lenhart Schubert
Proceedings of the First International Workshop on Designing Meaning Representations

Unscoped episodic logical form (ULF) is a semantic representation capturing the predicate-argument structure of English within the episodic logic formalism in relation to the syntactic structure, while leaving scope, word sense, and anaphora unresolved. We describe how ULF can be used to generate natural language inferences that are grounded in the semantic and syntactic structure through a small set of rules defined over interpretable predicates and transformations on ULFs. The semantic restrictions placed by ULF semantic types enables us to ensure that the inferred structures are semantically coherent while the nearness to syntax enables accurate mapping to English. We demonstrate these inferences on four classes of conversationally-oriented inferences in a mixed genre dataset with 68.5% precision from human judgments.

2017

pdf bib
Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation
Gene Kim | Lenhart Schubert
Proceedings of the Workshop Computational Semantics Beyond Events and Roles

This paper describes current efforts in developing an annotation schema and guidelines for sentences in Episodic Logic (EL). We focus on important distinctions for representing modality, attitudes, and tense and present an annotation schema that makes these distinctions. EL has proved competitive with other logical formulations in speed and inference-enablement, while expressing a wider array of natural language phenomena including intensional modification of predicates and sentences, propositional attitudes, and tense and aspect.

2016

pdf bib
High-Fidelity Lexical Axiom Construction from Verb Glosses
Gene Kim | Lenhart Schubert
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics