Kumar Dubey


2023

pdf bib
Unsupervised Opinion Summarization Using Approximate Geodesics
Somnath Basu Roy Chowdhury | Nicholas Monath | Kumar Dubey | Amr Ahmed | Snigdha Chaturvedi
Findings of the Association for Computational Linguistics: EMNLP 2023

Opinion summarization is the task of creating summaries capturing popular opinions from user reviews. In this paper, we introduce Geodesic Summarizer (GeoSumm), a novel system to perform unsupervised extractive opinion summarization. GeoSumm consists of an encoder-decoder based representation learning model that generates topical representations of texts. These representations capture the underlying semantics of the text as a distribution over learnable latent units. GeoSumm generates these topical representations by performing dictionary learning over pre-trained text representations at multiple layers of the decoder. We then use these topical representations to quantify the importance of review sentences using a novel approximate geodesic distance-based scoring mechanism. We use the importance scores to identify popular opinions in order to compose general and aspect-specific summaries. Our proposed model, GeoSumm, achieves strong performance on three opinion summarization datasets. We perform additional experiments to analyze the functioning of our model and showcase the generalization ability of GeoSumm across different domains.

pdf bib
Unsupervised Opinion Summarization Using Approximate Geodesics
Somnath Basu Roy Chowdhury | Nicholas Monath | Kumar Dubey | Amr Ahmed | Snigdha Chaturvedi
Proceedings of the 4th New Frontiers in Summarization Workshop

Opinion summarization is the task of creating summaries capturing popular opinions from user reviews.In this paper, we introduce Geodesic Summarizer (GeoSumm), a novel system to perform unsupervised extractive opinion summarization. GeoSumm consists of an encoder-decoder based representation learning model that generates topical representations of texts. These representations capture the underlying semantics of the text as a distribution over learnable latent units. GeoSumm generates these topical representations by performing dictionary learning over pre-trained text representations at multiple layers of the decoder. We then use these topical representations to quantify the importance of review sentences using a novel approximate geodesic distance-based scoring mechanism. We use the importance scores to identify popular opinions in order to compose general and aspect-specific summaries. Our proposed model, GeoSumm, achieves strong performance on three opinion summarization datasets. We perform additional experiments to analyze the functioning of our model and showcase the generalization ability of GeoSumm across different domains.

2017

pdf bib
From Textbooks to Knowledge: A Case Study in Harvesting Axiomatic Knowledge from Textbooks to Solve Geometry Problems
Mrinmaya Sachan | Kumar Dubey | Eric Xing
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Textbooks are rich sources of information. Harvesting structured knowledge from textbooks is a key challenge in many educational applications. As a case study, we present an approach for harvesting structured axiomatic knowledge from math textbooks. Our approach uses rich contextual and typographical features extracted from raw textbooks. It leverages the redundancy and shared ordering across multiple textbooks to further refine the harvested axioms. These axioms are then parsed into rules that are used to improve the state-of-the-art in solving geometry problems.

2016

pdf bib
Science Question Answering using Instructional Materials
Mrinmaya Sachan | Kumar Dubey | Eric Xing
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

pdf bib
Learning Answer-Entailing Structures for Machine Comprehension
Mrinmaya Sachan | Kumar Dubey | Eric Xing | Matthew Richardson
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)