Munindar P. Singh

Also published as: Munindar Singh

2023

CoSiNES: Contrastive Siamese Network for Entity Standardization
Jiaqing Yuan | Michele Merler | Mihir Choudhury | Raju Pavuluri | Munindar Singh | Maja Vukovic
Proceedings of the First Workshop on Matching From Unstructured and Structured Data (MATCHING 2023)

Entity standardization maps noisy mentions from free-form text to standard entities in a knowledge base. The unique challenge of this task relative to other entity-related tasks is the lack of surrounding context and numerous variations in the surface form of the mentions, especially when it comes to generalization across domains where labeled data is scarce. Previous research mostly focuses on developing models either heavily relying on context, or dedicated solely to a specific domain. In contrast, we propose CoSiNES, a generic and adaptable framework with Contrastive Siamese Network for Entity Standardization that effectively adapts a pretrained language model to capture the syntax and semantics of the entities in a new domain. We construct a new dataset in the technology domain, which contains 640 technical stack entities and 6,412 mentions collected from industrial content management systems. We demonstrate that CoSiNES yields higher accuracy and faster runtime than baselines derived from leading methods in this domain. CoSiNES also achieves competitive performance in four standard datasets from the chemistry, medicine, and biomedical domains, demonstrating its cross-domain applicability. Code and data is available at https://github.com/konveyor/tackle-container-advisor/tree/main/entity_standardizer/cosines

2022

pdf bib abs

Pixie: Preference in Implicit and Explicit Comparisons
Amanul Haque | Vaibhav Garg | Hui Guo | Munindar Singh
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We present Pixie, a manually annotated dataset for preference classification comprising 8,890 sentences drawn from app reviews. Unlike previous studies on preference classification, Pixie contains implicit (omitting an entity being compared) and indirect (lacking comparative linguistic cues) comparisons. We find that transformer-based pretrained models, finetuned on Pixie, achieve a weighted average F1 score of 83.34% and outperform the existing state-of-the-art preference classification model (73.99%).

2020

pdf bib abs

Lin: Unsupervised Extraction of Tasks from Textual Communication
Parth Diwanji | Hui Guo | Munindar Singh | Anup Kalia
Proceedings of the 28th International Conference on Computational Linguistics

Commitments and requests are a hallmark of collaborative communication, especially in team settings. Identifying specific tasks being committed to or request from emails and chat messages can enable important downstream tasks, such as producing todo lists, reminders, and calendar entries. State-of-the-art approaches for task identification rely on large annotated datasets, which are not always available, especially for domain-specific tasks. Accordingly, we propose Lin, an unsupervised approach of identifying tasks that leverages dependency parsing and VerbNet. Our evaluations show that Lin yields comparable or more accurate results than supervised models on domains with large training sets, and maintains its excellent performance on unseen domains.

pdf bib abs

Octa: Omissions and Conflicts in Target-Aspect Sentiment Analysis
Zhe Zhang | Chung-Wei Hang | Munindar Singh
Findings of the Association for Computational Linguistics: EMNLP 2020

Sentiments in opinionated text are often determined by both aspects and target words (or targets). We observe that targets and aspects interrelate in subtle ways, often yielding conflicting sentiments. Thus, a naive aggregation of sentiments from aspects and targets treated separately, as in existing sentiment analysis models, impairs performance. We propose Octa, an approach that jointly considers aspects and targets when inferring sentiments. To capture and quantify relationships between targets and context words, Octa uses a selective self-attention mechanism that handles implicit or missing targets. Specifically, Octa involves two layers of attention mechanisms for, respectively, selective attention between targets and context words and attention over words based on aspects. On benchmark datasets, Octa outperforms leading models by a large margin, yielding (absolute) gains in accuracy of 1.6% to 4.3%.

2019

pdf bib abs

Leveraging Structural and Semantic Correspondence for Attribute-Oriented Aspect Sentiment Discovery
Zhe Zhang | Munindar Singh
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Opinionated text often involves attributes such as authorship and location that influence the sentiments expressed for different aspects. We posit that structural and semantic correspondence is both prevalent in opinionated text, especially when associated with attributes, and crucial in accurately revealing its latent aspect and sentiment structure. However, it is not recognized by existing approaches. We propose Trait, an unsupervised probabilistic model that discovers aspects and sentiments from text and associates them with different attributes. To this end, Trait infers and leverages structural and semantic correspondence using a Markov Random Field. We show empirically that by incorporating attributes explicitly Trait significantly outperforms state-of-the-art baselines both by generating attribute profiles that accord with our intuitions, as shown via visualization, and yielding topics of greater semantic cohesion.

2018

pdf bib abs

Limbic: Author-Based Sentiment Aspect Modeling Regularized with Word Embeddings and Discourse Relations
Zhe Zhang | Munindar Singh
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We propose Limbic, an unsupervised probabilistic model that addresses the problem of discovering aspects and sentiments and associating them with authors of opinionated texts. Limbic combines three ideas, incorporating authors, discourse relations, and word embeddings. For discourse relations, Limbic adopts a generative process regularized by a Markov Random Field. To promote words with high semantic similarity into the same topic, Limbic captures semantic regularities from word embeddings via a generalized Pólya Urn process. We demonstrate that Limbic (1) discovers aspects associated with sentiments with high lexical diversity; (2) outperforms state-of-the-art models by a substantial margin in topic cohesion and sentiment classification.

Munindar P. Singh

2023

2022

2020

2019

2018

2014

Co-authors

Venues