2018
pdf
bib
abs
Tackling the Story Ending Biases in The Story Cloze Test
Rishi Sharma
|
James Allen
|
Omid Bakhshandeh
|
Nasrin Mostafazadeh
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
The Story Cloze Test (SCT) is a recent framework for evaluating story comprehension and script learning. There have been a variety of models tackling the SCT so far. Although the original goal behind the SCT was to require systems to perform deep language understanding and commonsense reasoning for successful narrative understanding, some recent models could perform significantly better than the initial baselines by leveraging human-authorship biases discovered in the SCT dataset. In order to shed some light on this issue, we have performed various data analysis and analyzed a variety of top performing models presented for this task. Given the statistics we have aggregated, we have designed a new crowdsourcing scheme that creates a new SCT dataset, which overcomes some of the biases. We benchmark a few models on the new dataset and show that the top-performing model on the original SCT dataset fails to keep up its performance. Our findings further signify the importance of benchmarking NLP systems on various evolving test sets.
2017
pdf
bib
abs
Apples to Apples: Learning Semantics of Common Entities Through a Novel Comprehension Task
Omid Bakhshandeh
|
James Allen
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Understanding common entities and their attributes is a primary requirement for any system that comprehends natural language. In order to enable learning about common entities, we introduce a novel machine comprehension task, GuessTwo: given a short paragraph comparing different aspects of two real-world semantically-similar entities, a system should guess what those entities are. Accomplishing this task requires deep language understanding which enables inference, connecting each comparison paragraph to different levels of knowledge about world entities and their attributes. So far we have crowdsourced a dataset of more than 14K comparison paragraphs comparing entities from a variety of categories such as fruits and animals. We have designed two schemes for evaluation: open-ended, and binary-choice prediction. For benchmarking further progress in the task, we have collected a set of paragraphs as the test set on which human can accomplish the task with an accuracy of 94.2% on open-ended prediction. We have implemented various models for tackling the task, ranging from semantic-driven to neural models. The semantic-driven approach outperforms the neural models, however, the results indicate that the task is very challenging across the models.
2016
pdf
bib
Towards Broad-coverage Meaning Representation: The Case of Comparison Structures
Omid Bakhshandeh
|
James Allen
Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods
pdf
bib
Learning to Jointly Predict Ellipsis and Comparison Structures
Omid Bakhshandeh
|
Alexis Cornelia Wellwood
|
James Allen
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning
2015
pdf
bib
Semantic Framework for Comparison Structures in Natural Language
Omid Bakhshandeh
|
James Allen
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
pdf
bib
From Adjective Glosses to Attribute Concepts: Learning Different Aspects That an Adjective Can Describe
Omid Bakhshandeh
|
James Allen
Proceedings of the 11th International Conference on Computational Semantics