Alfio Gliozzo

Also published as: Alfio M. Gliozzo, Alfio Massimiliano Gliozzo, Alfio Massimiliano Gliozzo


2021

pdf bib
Robust Retrieval Augmented Generation for Zero-shot Slot Filling
Michael Glass | Gaetano Rossiello | Md Faisal Mahbub Chowdhury | Alfio Gliozzo
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Automatically inducing high quality knowledge graphs from a given collection of documents still remains a challenging problem in AI. One way to make headway for this problem is through advancements in a related task known as slot filling. In this task, given an entity query in form of [Entity, Slot, ?], a system is asked to ‘fill’ the slot by generating or extracting the missing value exploiting evidence extracted from relevant passage(s) in the given document collection. The recent works in the field try to solve this task in an end-to-end fashion using retrieval-based language models. In this paper, we present a novel approach to zero-shot slot filling that extends dense passage retrieval with hard negatives and robust training procedures for retrieval augmented generation models. Our model reports large improvements on both T-REx and zsRE slot filling datasets, improving both passage retrieval and slot value generation, and ranking at the top-1 position in the KILT leaderboard. Moreover, we demonstrate the robustness of our system showing its domain adaptation capability on a new variant of the TACRED dataset for slot filling, through a combination of zero/few-shot learning. We release the source code and pre-trained models.

pdf bib
Topic Transferable Table Question Answering
Saneem Chemmengath | Vishwajeet Kumar | Samarth Bharadwaj | Jaydeep Sen | Mustafa Canim | Soumen Chakrabarti | Alfio Gliozzo | Karthik Sankaranarayanan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Weakly-supervised table question-answering (TableQA) models have achieved state-of-art performance by using pre-trained BERT transformer to jointly encoding a question and a table to produce structured query for the question. However, in practical settings TableQA systems are deployed over table corpora having topic and word distributions quite distinct from BERT’s pretraining corpus. In this work we simulate the practical topic shift scenario by designing novel challenge benchmarks WikiSQL-TS and WikiTable-TS, consisting of train-dev-test splits in five distinct topic groups, based on the popular WikiSQL and WikiTable-Questions datasets. We empirically show that, despite pre-training on large open-domain text, performance of models degrades significantly when they are evaluated on unseen topics. In response, we propose T3QA (Topic Transferable Table Question Answering) a pragmatic adaptation framework for TableQA comprising of: (1) topic-specific vocabulary injection into BERT, (2) a novel text-to-text transformer generator (such as T5, GPT2) based natural language question generation pipeline focused on generating topic-specific training data, and (3) a logical form re-ranker. We show that T3QA provides a reasonably good baseline for our topic shift benchmarks. We believe our topic split benchmarks will lead to robust TableQA solutions that are better suited for practical deployment

pdf bib
Open Knowledge Graphs Canonicalization using Variational Autoencoders
Sarthak Dash | Gaetano Rossiello | Nandana Mihindukulasooriya | Sugato Bagchi | Alfio Gliozzo
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Noun phrases and Relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples. Existing approaches to solve this problem take a two-step approach. First, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features. In this work, we propose Canonicalizing Using Variational AutoEncoders and Side Information (CUVA), a joint model to learn both embeddings and cluster assignments in an end-to-end approach, which leads to a better vector representation for the noun and relation phrases. Our evaluation over multiple benchmarks shows that CUVA outperforms the existing state-of-the-art approaches. Moreover, we introduce CanonicNell, a novel dataset to evaluate entity canonicalization systems.

pdf bib
Leveraging Abstract Meaning Representation for Knowledge Base Question Answering
Pavan Kapanipathi | Ibrahim Abdelaziz | Srinivas Ravishankar | Salim Roukos | Alexander Gray | Ramón Fernandez Astudillo | Maria Chang | Cristina Cornelio | Saswati Dana | Achille Fokoue | Dinesh Garg | Alfio Gliozzo | Sairam Gurajada | Hima Karanam | Naweed Khan | Dinesh Khandelwal | Young-Suk Lee | Yunyao Li | Francois Luus | Ndivhuwo Makondo | Nandana Mihindukulasooriya | Tahira Naseem | Sumit Neelam | Lucian Popa | Revanth Gangi Reddy | Ryan Riegel | Gaetano Rossiello | Udit Sharma | G P Shrivatsa Bhargav | Mo Yu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Capturing Row and Column Semantics in Transformer Based Question Answering over Tables
Michael Glass | Mustafa Canim | Alfio Gliozzo | Saneem Chemmengath | Vishwajeet Kumar | Rishav Chakravarti | Avi Sil | Feifei Pan | Samarth Bharadwaj | Nicolas Rodolfo Fauceglia
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Transformer based architectures are recently used for the task of answering questions over tables. In order to improve the accuracy on this task, specialized pre-training techniques have been developed and applied on millions of open-domain web tables. In this paper, we propose two novel approaches demonstrating that one can achieve superior performance on table QA task without even using any of these specialized pre-training techniques. The first model, called RCI interaction, leverages a transformer based architecture that independently classifies rows and columns to identify relevant cells. While this model yields extremely high accuracy at finding cell values on recent benchmarks, a second model we propose, called RCI representation, provides a significant efficiency advantage for online QA systems over tables by materializing embeddings for existing tables. Experiments on recent benchmarks prove that the proposed methods can effectively locate cell values on tables (up to ~98% Hit@1 accuracy on WikiSQL lookup questions). Also, the interaction model outperforms the state-of-the-art transformer based approaches, pre-trained on very large table corpora (TAPAS and TaBERT), achieving ~3.4% and ~18.86% additional precision improvement on the standard WikiSQL benchmark.

pdf bib
Dynamic Facet Selection by Maximizing Graded Relevance
Michael Glass | Md Faisal Mahbub Chowdhury | Yu Deng | Ruchi Mahindru | Nicolas Rodolfo Fauceglia | Alfio Gliozzo | Nandana Mihindukulasooriya
Proceedings of the First Workshop on Interactive Learning for Natural Language Processing

Dynamic faceted search (DFS), an interactive query refinement technique, is a form of Human–computer information retrieval (HCIR) approach. It allows users to narrow down search results through facets, where the facets-documents mapping is determined at runtime based on the context of user query instead of pre-indexing the facets statically. In this paper, we propose a new unsupervised approach for dynamic facet generation, namely optimistic facets, which attempts to generate the best possible subset of facets, hence maximizing expected Discounted Cumulative Gain (DCG), a measure of ranking quality that uses a graded relevance scale. We also release code to generate a new evaluation dataset. Through empirical results on two datasets, we show that the proposed DFS approach considerably improves the document ranking in the search results.

pdf bib
A Semantics-aware Transformer Model of Relation Linking for Knowledge Base Question Answering
Tahira Naseem | Srinivas Ravishankar | Nandana Mihindukulasooriya | Ibrahim Abdelaziz | Young-Suk Lee | Pavan Kapanipathi | Salim Roukos | Alfio Gliozzo | Alexander Gray
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Relation linking is a crucial component of Knowledge Base Question Answering systems. Existing systems use a wide variety of heuristics, or ensembles of multiple systems, heavily relying on the surface question text. However, the explicit semantic parse of the question is a rich source of relation information that is not taken advantage of. We propose a simple transformer-based neural model for relation linking that leverages the AMR semantic parse of a sentence. Our system significantly outperforms the state-of-the-art on 4 popular benchmark datasets. These are based on either DBpedia or Wikidata, demonstrating that our approach is effective across KGs.

pdf bib
CLTR: An End-to-End, Transformer-Based System for Cell-Level Table Retrieval and Table Question Answering
Feifei Pan | Mustafa Canim | Michael Glass | Alfio Gliozzo | Peter Fox
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

We present the first end-to-end, transformer-based table question answering (QA) system that takes natural language questions and massive table corpora as inputs to retrieve the most relevant tables and locate the correct table cells to answer the question. Our system, CLTR, extends the current state-of-the-art QA over tables model to build an end-to-end table QA architecture. This system has successfully tackled many real-world table QA problems with a simple, unified pipeline. Our proposed system can also generate a heatmap of candidate columns and rows over complex tables and allow users to quickly identify the correct cells to answer questions. In addition, we introduce two new open domain benchmarks, E2E_WTQ and E2E_GNQ, consisting of 2,005 natural language questions over 76,242 tables. The benchmarks are designed to validate CLTR as well as accommodate future table retrieval and end-to-end table QA research and experiments. Our experiments demonstrate that our system is the current state-of-the-art model on the table retrieval task and produces promising results for end-to-end table QA.

2020

pdf bib
Taxonomy Construction of Unseen Domains via Graph-based Cross-Domain Knowledge Transfer
Chao Shang | Sarthak Dash | Md. Faisal Mahbub Chowdhury | Nandana Mihindukulasooriya | Alfio Gliozzo
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Extracting lexico-semantic relations as graph-structured taxonomies, also known as taxonomy construction, has been beneficial in a variety of NLP applications. Recently Graph Neural Network (GNN) has shown to be powerful in successfully tackling many tasks. However, there has been no attempt to exploit GNN to create taxonomies. In this paper, we propose Graph2Taxo, a GNN-based cross-domain transfer framework for the taxonomy construction task. Our main contribution is to learn the latent features of taxonomy construction from existing domains to guide the structure learning of an unseen domain. We also propose a novel method of directed acyclic graph (DAG) generation for taxonomy construction. Specifically, our proposed Graph2Taxo uses a noisy graph constructed from automatically extracted noisy hyponym hypernym candidate pairs, and a set of taxonomies for some known domains for training. The learned model is then used to generate taxonomy for a new unknown domain given a set of terms for that domain. Experiments on benchmark datasets from science and environment domains show that our approach attains significant improvements correspondingly over the state of the art.

pdf bib
Span Selection Pre-training for Question Answering
Michael Glass | Alfio Gliozzo | Rishav Chakravarti | Anthony Ferritto | Lin Pan | G P Shrivatsa Bhargav | Dinesh Garg | Avi Sil
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

BERT (Bidirectional Encoder Representations from Transformers) and related pre-trained Transformers have provided large gains across many language understanding tasks, achieving a new state-of-the-art (SOTA). BERT is pretrained on two auxiliary tasks: Masked Language Model and Next Sentence Prediction. In this paper we introduce a new pre-training task inspired by reading comprehension to better align the pre-training from memorization to understanding. Span Selection PreTraining (SSPT) poses cloze-like training instances, but rather than draw the answer from the model’s parameters, it is selected from a relevant passage. We find significant and consistent improvements over both BERT-BASE and BERT-LARGE on multiple Machine Reading Comprehension (MRC) datasets. Specifically, our proposed model has strong empirical evidence as it obtains SOTA results on Natural Questions, a new benchmark MRC dataset, outperforming BERT-LARGE by 3 F1 points on short answer prediction. We also show significant impact in HotpotQA, improving answer prediction F1 by 4 points and supporting fact prediction F1 by 1 point and outperforming the previous best system. Moreover, we show that our pre-training approach is particularly effective when training data is limited, improving the learning curve by a large amount.

2019

pdf bib
Automatic Taxonomy Induction and Expansion
Nicolas Rodolfo Fauceglia | Alfio Gliozzo | Sarthak Dash | Md. Faisal Mahbub Chowdhury | Nandana Mihindukulasooriya
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

The Knowledge Graph Induction Service (KGIS) is an end-to-end knowledge induction system. One of its main capabilities is to automatically induce taxonomies from input documents using a hybrid approach that takes advantage of linguistic patterns, semantic web and neural networks. KGIS allows the user to semi-automatically curate and expand the induced taxonomy through a component called Smart SpreadSheet by exploiting distributional semantics. In this paper, we describe these taxonomy induction and expansion features of KGIS. A screencast video demonstrating the system is available in https://ibm.box.com/v/emnlp-2019-demo .

pdf bib
Learning Relational Representations by Analogy using Hierarchical Siamese Networks
Gaetano Rossiello | Alfio Gliozzo | Robert Farrell | Nicolas Fauceglia | Michael Glass
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We address relation extraction as an analogy problem by proposing a novel approach to learn representations of relations expressed by their textual mentions. In our assumption, if two pairs of entities belong to the same relation, then those two pairs are analogous. Following this idea, we collect a large set of analogous pairs by matching triples in knowledge bases with web-scale corpora through distant supervision. We leverage this dataset to train a hierarchical siamese network in order to learn entity-entity embeddings which encode relational information through the different linguistic paraphrasing expressing the same relation. We evaluate our model in a one-shot learning task by showing a promising generalization capability in order to classify unseen relation types, which makes this approach suitable to perform automatic knowledge base population with minimal supervision. Moreover, the model can be used to generate pre-trained embeddings which provide a valuable signal when integrated into an existing neural-based model by outperforming the state-of-the-art methods on a downstream relation extraction task.

2018

pdf bib
Discovering Implicit Knowledge with Unary Relations
Michael Glass | Alfio Gliozzo
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

State-of-the-art relation extraction approaches are only able to recognize relationships between mentions of entity arguments stated explicitly in the text and typically localized to the same sentence. However, the vast majority of relations are either implicit or not sententially localized. This is a major problem for Knowledge Base Population, severely limiting recall. In this paper we propose a new methodology to identify relations between two entities, consisting of detecting a very large number of unary relations, and using them to infer missing entities. We describe a deep learning architecture able to learn thousands of such relations very efficiently by using a common deep learning based representation. Our approach largely outperforms state of the art relation extraction technology on a newly introduced web scale knowledge base population benchmark, that we release to the research community.

2016

pdf bib
Joint Learning of Local and Global Features for Entity Linking via Neural Networks
Thien Huu Nguyen | Nicolas Fauceglia | Mariano Rodriguez Muro | Oktie Hassanzadeh | Alfio Massimiliano Gliozzo | Mohammad Sadoghi
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Previous studies have highlighted the necessity for entity linking systems to capture the local entity-mention similarities and the global topical coherence. We introduce a novel framework based on convolutional neural networks and recurrent neural networks to simultaneously model the local and global features for entity linking. The proposed model benefits from the capacity of convolutional neural networks to induce the underlying representations for local contexts and the advantage of recurrent neural networks to adaptively compress variable length sequences of predictions for global constraints. Our evaluation on multiple datasets demonstrates the effectiveness of the model and yields the state-of-the-art performance on such datasets. In addition, we examine the entity linking systems on the domain adaptation setting that further demonstrates the cross-domain robustness of the proposed model.

pdf bib
An Entity-Focused Approach to Generating Company Descriptions
Gavin Saldanha | Or Biran | Kathleen McKeown | Alfio Gliozzo
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2014

pdf bib
Lexical Substitution for the Medical Domain
Martin Riedl | Michael Glass | Alfio Gliozzo
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Word Semantic Representations using Bayesian Probabilistic Tensor Factorization
Jingwei Zhang | Jeremy Salwen | Michael Glass | Alfio Gliozzo
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Semantic Technologies in IBM Watson
Alfio Gliozzo | Or Biran | Siddharth Patwardhan | Kathleen McKeown
Proceedings of the Fourth Workshop on Teaching NLP and CL

pdf bib
JoBimText Visualizer: A Graph-based Approach to Contextualizing Distributional Similarity
Chris Biemann | Bonaventura Coppola | Michael R. Glass | Alfio Gliozzo | Matthew Hatem | Martin Riedl
Proceedings of TextGraphs-8 Graph-based Methods for Natural Language Processing

2012

pdf bib
Natural Language Processing in Watson
Alfio M. Gliozzo | Aditya Kalyanpur | James Fan
Tutorial Abstracts at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Structured Term Recognition in Medical Text
Michael Glass | Alfio Gliozzo
Proceedings of COLING 2012

pdf bib
When Did that Happen? — Linking Events and Relations to Timestamps
Dirk Hovy | James Fan | Alfio Gliozzo | Siddharth Patwardhan | Christopher Welty
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2009

pdf bib
Kernel Methods for Minimally Supervised WSD
Claudio Giuliano | Alfio Massimiliano Gliozzo | Carlo Strapparava
Computational Linguistics, Volume 35, Number 4, December 2009

pdf bib
Bridging Languages by SuperSense Entity Tagging
Davide Picca | Alfio Massimiliano Gliozzo | Simone Campora
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

2008

pdf bib
LMM: an OWL-DL MetaModel to Represent Heterogeneous Lexical Knowledge
Davide Picca | Alfio Massimiliano Gliozzo | Aldo Gangemi
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we present a Linguistic Meta-Model (LMM) allowing a semiotic-cognitive representation of knowledge. LMM is freely available and integrates the schemata of linguistic knowledge resources, such as WordNet and FrameNet, as well as foundational ontologies, such as DOLCE and its extensions. In addition, LMM is able to deal with multilinguality and to represent individuals and facts in an open domain perspective.

pdf bib
Supersense Tagger for Italian
Davide Picca | Alfio Massimiliano Gliozzo | Massimiliano Ciaramita
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we present the procedure we followed to develop the Italian Super Sense Tagger. In particular, we adapted the English SuperSense Tagger to the Italian Language by exploiting a parallel sense labeled corpus for training. As for English, the Italian tagger uses a fixed set of 26 semantic labels, called supersenses, achieving a slightly lower accuracy due to the lower quality of the Italian training data. Both taggers accomplish the same task of identifying entities and concepts belonging to a common set of ontological types. This parallelism allows us to define effective methodologies for a broad range of cross-language knowledge acquisition tasks

pdf bib
Instance-Based Ontology Population Exploiting Named-Entity Substitution
Claudio Giuliano | Alfio Gliozzo
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
FBK-irst: Lexical Substitution Task Exploiting Domain and Syntagmatic Coherence
Claudio Giuliano | Alfio Gliozzo | Carlo Strapparava
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
Instance Based Lexical Entailment for Ontology Population
Claudio Giuliano | Alfio Gliozzo
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
The Domain Restriction Hypothesis: Relating Term Similarity and Semantic Consistency
Alfio Massimiliano Gliozzo | Marco Pennacchiotti | Patrick Pantel
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

2006

pdf bib
The GOD model
Alfio Massimiliano Gliozzo
Demonstrations

pdf bib
Direct Word Sense Matching for Lexical Substitution
Ido Dagan | Oren Glickman | Alfio Gliozzo | Efrat Marmorshtein | Carlo Strapparava
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization
Alfio Gliozzo | Carlo Strapparava
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Syntagmatic Kernels: a Word Sense Disambiguation Case Study
Claudio Giuliano | Alfio Gliozzo | Carlo Strapparava
Proceedings of the Workshop on Learning Structured Information in Natural Language Applications

2005

pdf bib
Investigating Unsupervised Learning for Text Categorization Bootstrapping
Alfio Gliozzo | Carlo Strapparava | Ido Dagan
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Domain Kernels for Text Categorization
Alfio Gliozzo | Carlo Strapparava
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

pdf bib
Cross Language Text Categorization by Acquiring Multilingual Domain Models from Comparable Corpora
Alfio Gliozzo | Carlo Strapparava
Proceedings of the ACL Workshop on Building and Using Parallel Texts

pdf bib
Domain Kernels for Word Sense Disambiguation
Alfio Gliozzo | Claudio Giuliano | Carlo Strapparava
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
Pattern abstraction and term similarity for Word Sense Disambiguation: IRST at Senseval-3
Carlo Strapparava | Alfio Gliozzo | Claudio Giuliano
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
Unsupervised Domain Relevance Estimation for Word Sense Disambiguation
Alfio Gliozzo | Bernardo Magnini | Carlo Strapparava
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

2001

pdf bib
Using Domain Information for Word Sense Disambiguation
Bernardo Magnini | Carlo Strapparava | Giovanni Pezzulo | Alfio Gliozzo
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

Search