Nandana Mihindukulasooriya


2022

pdf bib
Permutation Invariant Strategy Using Transformer Encoders for Table Understanding
Sarthak Dash | Sugato Bagchi | Nandana Mihindukulasooriya | Alfio Gliozzo
Findings of the Association for Computational Linguistics: NAACL 2022

Representing text in tables is essential for many business intelligence tasks such as semantic retrieval, data exploration and visualization, and question answering. Existing methods that leverage pretrained Transformer encoders range from a simple construction of pseudo-sentences by concatenating text across rows or columns to complex parameter-intensive models that encode table structure and require additional pretraining. In this work, we introduce a novel encoding strategy for Transformer encoders that preserves the critical property of permutation invariance across rows or columns. Unlike existing state-of-the-art methods for Table Understanding, our proposed approach does not require any additional pretraining and still substantially outperforms existing methods in almost all instances. We demonstrate the effectiveness of our proposed approach on three table interpretation tasks: column type annotation, relation extraction, and entity linking through extensive experiments on existing tabular datasets.

pdf bib
An Empirical Study on Pseudo-log-likelihood Bias Measures for Masked Language Models Using Paraphrased Sentences
Bum Chul Kwon | Nandana Mihindukulasooriya
Proceedings of the 2nd Workshop on Trustworthy Natural Language Processing (TrustNLP 2022)

In this paper, we conduct an empirical study on a bias measure, log-likelihood Masked Language Model (MLM) scoring, on a benchmark dataset. Previous work evaluates whether MLMs are biased or not for certain protected attributes (e.g., race) by comparing the log-likelihood scores of sentences that contain stereotypical characteristics with one category (e.g., black) versus another (e.g., white). We hypothesized that this approach might be too sensitive to the choice of contextual words than the meaning of the sentence. Therefore, we computed the same measure after paraphrasing the sentences with different words but with same meaning. Our results demonstrate that the log-likelihood scoring can be more sensitive to utterance of specific words than to meaning behind a given sentence. Our paper reveals a shortcoming of the current log-likelihood-based bias measures for MLMs and calls for new ways to improve the robustness of it

2021

pdf bib
Open Knowledge Graphs Canonicalization using Variational Autoencoders
Sarthak Dash | Gaetano Rossiello | Nandana Mihindukulasooriya | Sugato Bagchi | Alfio Gliozzo
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Noun phrases and Relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples. Existing approaches to solve this problem take a two-step approach. First, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features. In this work, we propose Canonicalizing Using Variational AutoEncoders and Side Information (CUVA), a joint model to learn both embeddings and cluster assignments in an end-to-end approach, which leads to a better vector representation for the noun and relation phrases. Our evaluation over multiple benchmarks shows that CUVA outperforms the existing state-of-the-art approaches. Moreover, we introduce CanonicNell, a novel dataset to evaluate entity canonicalization systems.

pdf bib
Leveraging Abstract Meaning Representation for Knowledge Base Question Answering
Pavan Kapanipathi | Ibrahim Abdelaziz | Srinivas Ravishankar | Salim Roukos | Alexander Gray | Ramón Fernandez Astudillo | Maria Chang | Cristina Cornelio | Saswati Dana | Achille Fokoue | Dinesh Garg | Alfio Gliozzo | Sairam Gurajada | Hima Karanam | Naweed Khan | Dinesh Khandelwal | Young-Suk Lee | Yunyao Li | Francois Luus | Ndivhuwo Makondo | Nandana Mihindukulasooriya | Tahira Naseem | Sumit Neelam | Lucian Popa | Revanth Gangi Reddy | Ryan Riegel | Gaetano Rossiello | Udit Sharma | G P Shrivatsa Bhargav | Mo Yu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
A Semantics-aware Transformer Model of Relation Linking for Knowledge Base Question Answering
Tahira Naseem | Srinivas Ravishankar | Nandana Mihindukulasooriya | Ibrahim Abdelaziz | Young-Suk Lee | Pavan Kapanipathi | Salim Roukos | Alfio Gliozzo | Alexander Gray
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Relation linking is a crucial component of Knowledge Base Question Answering systems. Existing systems use a wide variety of heuristics, or ensembles of multiple systems, heavily relying on the surface question text. However, the explicit semantic parse of the question is a rich source of relation information that is not taken advantage of. We propose a simple transformer-based neural model for relation linking that leverages the AMR semantic parse of a sentence. Our system significantly outperforms the state-of-the-art on 4 popular benchmark datasets. These are based on either DBpedia or Wikidata, demonstrating that our approach is effective across KGs.

pdf bib
Dynamic Facet Selection by Maximizing Graded Relevance
Michael Glass | Md Faisal Mahbub Chowdhury | Yu Deng | Ruchi Mahindru | Nicolas Rodolfo Fauceglia | Alfio Gliozzo | Nandana Mihindukulasooriya
Proceedings of the First Workshop on Interactive Learning for Natural Language Processing

Dynamic faceted search (DFS), an interactive query refinement technique, is a form of Human–computer information retrieval (HCIR) approach. It allows users to narrow down search results through facets, where the facets-documents mapping is determined at runtime based on the context of user query instead of pre-indexing the facets statically. In this paper, we propose a new unsupervised approach for dynamic facet generation, namely optimistic facets, which attempts to generate the best possible subset of facets, hence maximizing expected Discounted Cumulative Gain (DCG), a measure of ranking quality that uses a graded relevance scale. We also release code to generate a new evaluation dataset. Through empirical results on two datasets, we show that the proposed DFS approach considerably improves the document ranking in the search results.

2020

pdf bib
Taxonomy Construction of Unseen Domains via Graph-based Cross-Domain Knowledge Transfer
Chao Shang | Sarthak Dash | Md. Faisal Mahbub Chowdhury | Nandana Mihindukulasooriya | Alfio Gliozzo
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Extracting lexico-semantic relations as graph-structured taxonomies, also known as taxonomy construction, has been beneficial in a variety of NLP applications. Recently Graph Neural Network (GNN) has shown to be powerful in successfully tackling many tasks. However, there has been no attempt to exploit GNN to create taxonomies. In this paper, we propose Graph2Taxo, a GNN-based cross-domain transfer framework for the taxonomy construction task. Our main contribution is to learn the latent features of taxonomy construction from existing domains to guide the structure learning of an unseen domain. We also propose a novel method of directed acyclic graph (DAG) generation for taxonomy construction. Specifically, our proposed Graph2Taxo uses a noisy graph constructed from automatically extracted noisy hyponym hypernym candidate pairs, and a set of taxonomies for some known domains for training. The learned model is then used to generate taxonomy for a new unknown domain given a set of terms for that domain. Experiments on benchmark datasets from science and environment domains show that our approach attains significant improvements correspondingly over the state of the art.

2019

pdf bib
Automatic Taxonomy Induction and Expansion
Nicolas Rodolfo Fauceglia | Alfio Gliozzo | Sarthak Dash | Md. Faisal Mahbub Chowdhury | Nandana Mihindukulasooriya
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

The Knowledge Graph Induction Service (KGIS) is an end-to-end knowledge induction system. One of its main capabilities is to automatically induce taxonomies from input documents using a hybrid approach that takes advantage of linguistic patterns, semantic web and neural networks. KGIS allows the user to semi-automatically curate and expand the induced taxonomy through a component called Smart SpreadSheet by exploiting distributional semantics. In this paper, we describe these taxonomy induction and expansion features of KGIS. A screencast video demonstrating the system is available in https://ibm.box.com/v/emnlp-2019-demo .