Tania Bedrax-Weiss


2021

pdf bib
Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary
Daniel Deutsch | Tania Bedrax-Weiss | Dan Roth
Transactions of the Association for Computational Linguistics, Volume 9

Abstract A desirable property of a reference-based evaluation metric that measures the content quality of a summary is that it should estimate how much information that summary has in common with a reference. Traditional text overlap based metrics such as ROUGE fail to achieve this because they are limited to matching tokens, either lexically or via embeddings. In this work, we propose a metric to evaluate the content quality of a summary using question-answering (QA). QA-based methods directly measure a summary’s information overlap with a reference, making them fundamentally different than text overlap metrics. We demonstrate the experimental benefits of QA-based metrics through an analysis of our proposed metric, QAEval. QAEval outperforms current state-of-the-art metrics on most evaluations using benchmark datasets, while being competitive on others due to limitations of state-of-the-art models. Through a careful analysis of each component of QAEval, we identify its performance bottlenecks and estimate that its potential upper-bound performance surpasses all other automatic metrics, approaching that of the gold-standard Pyramid Method.1

2019

pdf bib
PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text
Haitian Sun | Tania Bedrax-Weiss | William Cohen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We consider open-domain question answering (QA) where answers are drawn from either a corpus, a knowledge base (KB), or a combination of both of these. We focus on a setting in which a corpus is supplemented with a large but incomplete KB, and on questions that require non-trivial (e.g., “multi-hop”) reasoning. We describe PullNet, an integrated framework for (1) learning what to retrieve and (2) reasoning with this heterogeneous information to find the best answer. PullNet uses an iterative process to construct a question-specific subgraph that contains information relevant to the question. In each iteration, a graph convolutional network (graph CNN) is used to identify subgraph nodes that should be expanded using retrieval (or “pull”) operations on the corpus and/or KB. After the subgraph is complete, another graph CNN is used to extract the answer from the subgraph. This retrieve-and-reason process allows us to answer multi-hop questions using large KBs and corpora. PullNet is weakly supervised, requiring question-answer pairs but not gold inference paths. Experimentally PullNet improves over the prior state-of-the art, and in the setting where a corpus is used with incomplete KB these improvements are often dramatic. PullNet is also often superior to prior systems in a KB-only setting or a text-only setting.

pdf bib
Proceedings of the First Workshop on NLP for Conversational AI
Yun-Nung Chen | Tania Bedrax-Weiss | Dilek Hakkani-Tur | Anuj Kumar | Mike Lewis | Thang-Minh Luong | Pei-Hao Su | Tsung-Hsien Wen
Proceedings of the First Workshop on NLP for Conversational AI

pdf bib
How Large Are Lions? Inducing Distributions over Quantitative Attributes
Yanai Elazar | Abhijit Mahabal | Deepak Ramachandran | Tania Bedrax-Weiss | Dan Roth
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Most current NLP systems have little knowledge about quantitative attributes of objects and events. We propose an unsupervised method for collecting quantitative information from large amounts of web data, and use it to create a new, very large resource consisting of distributions over physical quantities associated with objects, adjectives, and verbs which we call Distributions over Quantitative (DoQ). This contrasts with recent work in this area which has focused on making only relative comparisons such as “Is a lion bigger than a wolf?”. Our evaluation shows that DoQ compares favorably with state of the art results on existing datasets for relative comparisons of nouns and adjectives, and on a new dataset we introduce.

2018

pdf bib
Points, Paths, and Playscapes: Large-scale Spatial Language Understanding Tasks Set in the Real World
Jason Baldridge | Tania Bedrax-Weiss | Daphne Luong | Srini Narayanan | Bo Pang | Fernando Pereira | Radu Soricut | Michael Tseng | Yuan Zhang
Proceedings of the First International Workshop on Spatial Language Understanding

Spatial language understanding is important for practical applications and as a building block for better abstract language understanding. Much progress has been made through work on understanding spatial relations and values in images and texts as well as on giving and following navigation instructions in restricted domains. We argue that the next big advances in spatial language understanding can be best supported by creating large-scale datasets that focus on points and paths based in the real world, and then extending these to create online, persistent playscapes that mix human and bot players, where the bot players must learn, evolve, and survive according to their depth of understanding of scenes, navigation, and interactions.