Michael Glass

Also published as: Michael R. Glass


2021

pdf bib
Dynamic Facet Selection by Maximizing Graded Relevance
Michael Glass | Md Faisal Mahbub Chowdhury | Yu Deng | Ruchi Mahindru | Nicolas Rodolfo Fauceglia | Alfio Gliozzo | Nandana Mihindukulasooriya
Proceedings of the First Workshop on Interactive Learning for Natural Language Processing

Dynamic faceted search (DFS), an interactive query refinement technique, is a form of Human–computer information retrieval (HCIR) approach. It allows users to narrow down search results through facets, where the facets-documents mapping is determined at runtime based on the context of user query instead of pre-indexing the facets statically. In this paper, we propose a new unsupervised approach for dynamic facet generation, namely optimistic facets, which attempts to generate the best possible subset of facets, hence maximizing expected Discounted Cumulative Gain (DCG), a measure of ranking quality that uses a graded relevance scale. We also release code to generate a new evaluation dataset. Through empirical results on two datasets, we show that the proposed DFS approach considerably improves the document ranking in the search results.

pdf bib
Capturing Row and Column Semantics in Transformer Based Question Answering over Tables
Michael Glass | Mustafa Canim | Alfio Gliozzo | Saneem Chemmengath | Vishwajeet Kumar | Rishav Chakravarti | Avi Sil | Feifei Pan | Samarth Bharadwaj | Nicolas Rodolfo Fauceglia
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Transformer based architectures are recently used for the task of answering questions over tables. In order to improve the accuracy on this task, specialized pre-training techniques have been developed and applied on millions of open-domain web tables. In this paper, we propose two novel approaches demonstrating that one can achieve superior performance on table QA task without even using any of these specialized pre-training techniques. The first model, called RCI interaction, leverages a transformer based architecture that independently classifies rows and columns to identify relevant cells. While this model yields extremely high accuracy at finding cell values on recent benchmarks, a second model we propose, called RCI representation, provides a significant efficiency advantage for online QA systems over tables by materializing embeddings for existing tables. Experiments on recent benchmarks prove that the proposed methods can effectively locate cell values on tables (up to ~98% Hit@1 accuracy on WikiSQL lookup questions). Also, the interaction model outperforms the state-of-the-art transformer based approaches, pre-trained on very large table corpora (TAPAS and TaBERT), achieving ~3.4% and ~18.86% additional precision improvement on the standard WikiSQL benchmark.

pdf bib
CLTR: An End-to-End, Transformer-Based System for Cell-Level Table Retrieval and Table Question Answering
Feifei Pan | Mustafa Canim | Michael Glass | Alfio Gliozzo | Peter Fox
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

We present the first end-to-end, transformer-based table question answering (QA) system that takes natural language questions and massive table corpora as inputs to retrieve the most relevant tables and locate the correct table cells to answer the question. Our system, CLTR, extends the current state-of-the-art QA over tables model to build an end-to-end table QA architecture. This system has successfully tackled many real-world table QA problems with a simple, unified pipeline. Our proposed system can also generate a heatmap of candidate columns and rows over complex tables and allow users to quickly identify the correct cells to answer questions. In addition, we introduce two new open domain benchmarks, E2E_WTQ and E2E_GNQ, consisting of 2,005 natural language questions over 76,242 tables. The benchmarks are designed to validate CLTR as well as accommodate future table retrieval and end-to-end table QA research and experiments. Our experiments demonstrate that our system is the current state-of-the-art model on the table retrieval task and produces promising results for end-to-end table QA.

2020

pdf bib
Span Selection Pre-training for Question Answering
Michael Glass | Alfio Gliozzo | Rishav Chakravarti | Anthony Ferritto | Lin Pan | G P Shrivatsa Bhargav | Dinesh Garg | Avi Sil
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

BERT (Bidirectional Encoder Representations from Transformers) and related pre-trained Transformers have provided large gains across many language understanding tasks, achieving a new state-of-the-art (SOTA). BERT is pretrained on two auxiliary tasks: Masked Language Model and Next Sentence Prediction. In this paper we introduce a new pre-training task inspired by reading comprehension to better align the pre-training from memorization to understanding. Span Selection PreTraining (SSPT) poses cloze-like training instances, but rather than draw the answer from the model’s parameters, it is selected from a relevant passage. We find significant and consistent improvements over both BERT-BASE and BERT-LARGE on multiple Machine Reading Comprehension (MRC) datasets. Specifically, our proposed model has strong empirical evidence as it obtains SOTA results on Natural Questions, a new benchmark MRC dataset, outperforming BERT-LARGE by 3 F1 points on short answer prediction. We also show significant impact in HotpotQA, improving answer prediction F1 by 4 points and supporting fact prediction F1 by 1 point and outperforming the previous best system. Moreover, we show that our pre-training approach is particularly effective when training data is limited, improving the learning curve by a large amount.

2019

pdf bib
CFO: A Framework for Building Production NLP Systems
Rishav Chakravarti | Cezar Pendus | Andrzej Sakrajda | Anthony Ferritto | Lin Pan | Michael Glass | Vittorio Castelli | J. William Murdock | Radu Florian | Salim Roukos | Avi Sil
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

This paper introduces a novel orchestration framework, called CFO (Computation Flow Orchestrator), for building, experimenting with, and deploying interactive NLP (Natural Language Processing) and IR (Information Retrieval) systems to production environments. We then demonstrate a question answering system built using this framework which incorporates state-of-the-art BERT based MRC (Machine Reading Com- prehension) with IR components to enable end-to-end answer retrieval. Results from the demo system are shown to be high quality in both academic and industry domain specific settings. Finally, we discuss best practices when (pre-)training BERT based MRC models for production systems. Screencast links: - Short video (< 3 min): http: //ibm.biz/gaama_demo - Supplementary long video (< 13 min): http://ibm.biz/gaama_cfo_demo

pdf bib
Learning Relational Representations by Analogy using Hierarchical Siamese Networks
Gaetano Rossiello | Alfio Gliozzo | Robert Farrell | Nicolas Fauceglia | Michael Glass
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We address relation extraction as an analogy problem by proposing a novel approach to learn representations of relations expressed by their textual mentions. In our assumption, if two pairs of entities belong to the same relation, then those two pairs are analogous. Following this idea, we collect a large set of analogous pairs by matching triples in knowledge bases with web-scale corpora through distant supervision. We leverage this dataset to train a hierarchical siamese network in order to learn entity-entity embeddings which encode relational information through the different linguistic paraphrasing expressing the same relation. We evaluate our model in a one-shot learning task by showing a promising generalization capability in order to classify unseen relation types, which makes this approach suitable to perform automatic knowledge base population with minimal supervision. Moreover, the model can be used to generate pre-trained embeddings which provide a valuable signal when integrated into an existing neural-based model by outperforming the state-of-the-art methods on a downstream relation extraction task.

2018

pdf bib
Discovering Implicit Knowledge with Unary Relations
Michael Glass | Alfio Gliozzo
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

State-of-the-art relation extraction approaches are only able to recognize relationships between mentions of entity arguments stated explicitly in the text and typically localized to the same sentence. However, the vast majority of relations are either implicit or not sententially localized. This is a major problem for Knowledge Base Population, severely limiting recall. In this paper we propose a new methodology to identify relations between two entities, consisting of detecting a very large number of unary relations, and using them to infer missing entities. We describe a deep learning architecture able to learn thousands of such relations very efficiently by using a common deep learning based representation. Our approach largely outperforms state of the art relation extraction technology on a newly introduced web scale knowledge base population benchmark, that we release to the research community.

2014

pdf bib
Lexical Substitution for the Medical Domain
Martin Riedl | Michael Glass | Alfio Gliozzo
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Word Semantic Representations using Bayesian Probabilistic Tensor Factorization
Jingwei Zhang | Jeremy Salwen | Michael Glass | Alfio Gliozzo
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
JoBimText Visualizer: A Graph-based Approach to Contextualizing Distributional Similarity
Chris Biemann | Bonaventura Coppola | Michael R. Glass | Alfio Gliozzo | Matthew Hatem | Martin Riedl
Proceedings of TextGraphs-8 Graph-based Methods for Natural Language Processing

2012

pdf bib
Structured Term Recognition in Medical Text
Michael Glass | Alfio Gliozzo
Proceedings of COLING 2012

2005

pdf bib
Aggregation Improves Learning: Experiments in Natural Language Generation for Intelligent Tutoring Systems
Barbara Di Eugenio | Davide Fossati | Dan Yu | Susan Haller | Michael Glass
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
Squibs and Discussions: The Kappa Statistic: A Second Look
Barbara Di Eugenio | Michael Glass
Computational Linguistics, Volume 30, Number 1, March 2004

2003

pdf bib
Latent Semantic Analysis for Dialogue Act Classification
Riccardo Serafin | Barbara Di Eugenio | Michael Glass
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

2002

pdf bib
The binomial cumulative distribution function, or, is my system better than yours?
Barbara Di Eugenio | Michael Glass | Michael J. Scott
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
MUP - The UIC Standoff Markup Tool
Michael Glass | Barbara Di Eugenio
Proceedings of the Third SIGdial Workshop on Discourse and Dialogue

pdf bib
The DIAG experiments: Natural Language Generation for Intelligent Tutoring Systems
Barbara Di Eugenio | Michael Glass | Michael Trolio
Proceedings of the International Natural Language Generation Conference

1998

pdf bib
System Demonstration Content Planning as the Basis for an Intelligent Tutoring System
Reva Freedman | Stefan Brandle | Michael Glass | Jung Hee Kim | Yujian Zhou | Martha W. Evens
Natural Language Generation