2025
pdf
bib
abs
Using Text-Based Causal Inference to Disentangle Factors Influencing Online Review Ratings
Linsen Li
|
Aron Culotta
|
Nicholas Mattei
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Online reviews provide valuable insights into the perceived quality of facets of a product or service. While aspect-based sentiment analysis has focused on extracting these facets from reviews, there is less work understanding the impact of each aspect on overall perception. This is particularly challenging given correlations among aspects, making it difficult to isolate the effects of each. This paper introduces a methodology based on recent advances in text-based causal analysis, specifically CausalBERT, to disentangle the effect of each factor on overall review ratings. We enhance CausalBERT with three key improvements: temperature scaling for better calibrated treatment assignment estimates; hyperparameter optimization to reduce confound overadjustment; and interpretability methods to characterize discovered confounds. In this work, we treat the textual mentions in reviews as proxies for real-world attributes. We validate our approach on real and semi-synthetic data from over 600K reviews of U.S. K-12 schools. We find that the proposed enhancements result in more reliable estimates, and that perception of school administration and performance on benchmarks are significant drivers of overall school ratings.
2018
pdf
bib
abs
An Interface for Annotating Science Questions
Michael Boratko
|
Harshit Padigela
|
Divyendra Mikkilineni
|
Pritish Yuvraj
|
Rajarshi Das
|
Andrew McCallum
|
Maria Chang
|
Achille Fokoue
|
Pavan Kapanipathi
|
Nicholas Mattei
|
Ryan Musa
|
Kartik Talamadupula
|
Michael Witbrock
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Recent work introduces the AI2 Reasoning Challenge (ARC) and the associated ARC dataset that partitions open domain, complex science questions into an Easy Set and a Challenge Set. That work includes an analysis of 100 questions with respect to the types of knowledge and reasoning required to answer them. However, it does not include clear definitions of these types, nor does it offer information about the quality of the labels or the annotation process used. In this paper, we introduce a novel interface for human annotation of science question-answer pairs with their respective knowledge and reasoning types, in order that the classification of new questions may be improved. We build on the classification schema proposed by prior work on the ARC dataset, and evaluate the effectiveness of our interface with a preliminary study involving 10 participants.
pdf
bib
abs
A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset
Michael Boratko
|
Harshit Padigela
|
Divyendra Mikkilineni
|
Pritish Yuvraj
|
Rajarshi Das
|
Andrew McCallum
|
Maria Chang
|
Achille Fokoue-Nkoutche
|
Pavan Kapanipathi
|
Nicholas Mattei
|
Ryan Musa
|
Kartik Talamadupula
|
Michael Witbrock
Proceedings of the Workshop on Machine Reading for Question Answering
The recent work of Clark et al. (2018) introduces the AI2 Reasoning Challenge (ARC) and the associated ARC dataset that partitions open domain, complex science questions into easy and challenge sets. That paper includes an analysis of 100 questions with respect to the types of knowledge and reasoning required to answer them; however, it does not include clear definitions of these types, nor does it offer information about the quality of the labels. We propose a comprehensive set of definitions of knowledge and reasoning types necessary for answering the questions in the ARC dataset. Using ten annotators and a sophisticated annotation interface, we analyze the distribution of labels across the challenge set and statistics related to them. Additionally, we demonstrate that although naive information retrieval methods return sentences that are irrelevant to answering the query, sufficient supporting text is often present in the (ARC) corpus. Evaluating with human-selected relevant sentences improves the performance of a neural machine comprehension model by 42 points.