Pushpak Bhattacharyya - ACL Anthology

Pushpak Bhattacharyya

Also published as: Pushpak Bhattacharya

2026

SrcMix: Mixing of Related Source Languages Benefits Extremely Low-resource Machine Translation
Sanjeev Kumar | Preethi Jyothi | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: EACL 2026

Multilingual models are widely used for machine translation (MT). However, their effectiveness for extremely low-resource languages (ELRLs) depends critically on how related languages are incorporated during fine-tuning. In this work, we study the role of language mixing directionality, linguistic relatedness, and script compatibility in ELRL translation. We propose SrcMix, a simple source-side mixing strategy that combines related ELRLs during fine-tuning while constraining the decoder to a single target language. Compared to its target-side counterpart TgtMix, SrcMix improves performance by +3 ChrF++ and +5 BLEU in high-resource to ELRL translations, and by +5 ChrF++ and +12 BLEU in mid-resource to ELRL translations. We also release the first Angika MT dataset and provide a systematic comparison of LLM (Aya-101) and NMT (mT5-Large) models under ELRL settings, highlighting the importance of directional mixing and linguistic compatibility.

How effective are VLMs in assisting humans in inferring the quality of mental models from Multimodal short answers?
Pritam Sil | Durgaprasad Karnam | Vinay Reddy Venumuddala | Pushpak Bhattacharyya
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

STEM Mental models can play a critical role in assessing students’ conceptual understanding of a topic. They not only offer insights into what students know but also into how effectively they can apply, relate to, and integrate concepts across various contexts. Thus, students’ responses are critical markers of the quality of their understanding and not entities that should be merely graded. However, inferring these mental models from student answers is challenging as it requires deep reasoning skills. We propose MMGrader, an approach that infers the quality of students’ mental models from their multimodal responses using concept graphs as an analytical framework. In our evaluation with 9 openly available models, we found that the best-performing models fall short of human-level performance. This is because they only achieved an accuracy of approximately 40%, a prediction error of 1.1 units, and a scoring distribution fairly aligned with human scoring patterns. With improved accuracy, these can be highly effective assistants to teachers in inferring the mental models of their entire classrooms, enabling them to do so efficiently and help improve their pedagogies more effectively by designing targeted help sessions and lectures that strengthen areas where students collectively demonstrate lower proficiency.

Rad-Flamingo: A Multimodal Prompt driven Radiology Report Generation Framework with Patient-Centric Explanations
Md. Tousin Akhter | Devansh Lalwani | Kshitij Sharad Jadhav | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: EACL 2026

In modern healthcare, radiology plays a pivotal role in diagnosing and managing diseases. However, the complexity of medical imaging data and the variability in interpretation can lead to inconsistencies and a lack of patient-centered insight in radiology reports. To address this challenge, a novel multimodal prompt-driven report generation framework Rad-Flamingo was developed, that integrates diverse data modalities—such as medical images, and clinical notes—to produce comprehensive and context-aware radiology reports. Our framework leverages innovative prompt engineering techniques to guide vision-language models in generating relevant information, ensuring these generated reports are not only accurate but also understandable to individual patients. A key feature of our framework is its ability to provide patient-centric explanations, offering clear and personalized insights into diagnostic findings and their implications. Additionally, we also demonstrate a synthetic data generation pipeline, to append any existing benchmark datasets’ findings and impressions with patient-centric explanation. Experimental results demonstrate that this framework’s effectiveness in enhancing report quality, improving understandability, and could foster better patient-doctor communication. This approach represents a significant step towards human-centered medical AI systems.

Assessing and Improving Punctuation Robustness in English-Marathi Machine Translation
Kaustubh Shivshankar Shejole | Sourabh Deoghare | Pushpak Bhattacharyya
Proceedings for the Ninth Workshop on Technologies for Machine Translation of Low Resource Languages (LoResMT 2026)

Neural Machine Translation (NMT) systems rely heavily on explicit punctuation cues to resolve semantic ambiguities in a source sentence. Inputting user-generated sentences, which are likely to contain missing or incorrect punctuation, results in fluent but semantically disastrous translations. This work attempts to highlight and address the problem of punctuation robustness of NMT systems through an English-to-Marathi translation. First, we introduce Virām, a human-curated diagnostic benchmark of 54 punctuation-ambiguous English-Marathi sentence pairs to stress-test existing NMT systems. Second, we evaluate two simple remediation strategies: cascade-based restore-then-translate and direct fine-tuning. Our experimental results and analysis demonstrate that both strategies yield substantial NMT performance improvements. Furthermore, we find that current Large Language Models (LLMs) exhibit relatively poorer robustness in translating such sentences than these task-specific strategies, thus necessitating further research in this area. The code and dataset are available at https://github.com/KaustubhShejole/Viram_Marathi.

2025

Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Kentaro Inui | Sakriani Sakti | Haofen Wang | Derek F. Wong | Pushpak Bhattacharyya | Biplab Banerjee | Asif Ekbal | Tanmoy Chakraborty | Dhirendra Pratap Singh
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

LLMs as Architects and Critics for Multi-Source Opinion Summarization
Anuj Attri | Arnav Attri | Suman Banerjee | Amey Patil | Muthusamy Chelliah | Nikesh Garera | Pushpak Bhattacharyya
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Multi-source Opinion Summarization (M-OS) extends beyond traditional opinion summarization by incorporating additional sources of product metadata such as descriptions, key features, specifications, and ratings, alongside reviews. This integration results in comprehensive summaries that capture both subjective opinions and objective product attributes essential for informed decision-making. While Large Language Models (LLMs) have shown significant success in various Natural Language Processing (NLP) tasks, their potential in M-OS remains largely unexplored. Additionally, the lack of evaluation datasets for this task has impeded further advancements. To bridge this gap, we introduce M-OS-EVAL, a benchmark dataset for evaluating multi-source opinion summaries across seven key dimensions: fluency, coherence, relevance, faithfulness, aspect coverage, sentiment consistency, and specificity. Our results demonstrate that M-OS significantly enhances user engagement, as evidenced by a user study in which, on average, 87% of participants preferred M-OS over opinion summaries. Our experiments demonstrate that factually enriched summaries enhance user engagement. Notably, M-OS-PROMPTS exhibit stronger alignment with human judgment, achieving an average Spearman correlation of ρ = 0.74, which surpasses the performance of previous methodologies.

BharatBBQ: A Multilingual Bias Benchmark for Question Answering in the Indian Context
Aditya Tomar | Nihar Ranjan Sahoo | Pushpak Bhattacharyya
Transactions of the Association for Computational Linguistics, Volume 13

Evaluating social biases in language models (LMs) is crucial for ensuring fairness and minimizing the reinforcement of harmful stereotypes in AI systems. Existing benchmarks, such as the Bias Benchmark for Question Answering (BBQ), primarily focus on Western contexts, limiting their applicability to the Indian context. To address this gap, we introduce BharatBBQ,1 a culturally adapted benchmark designed to assess biases in Hindi, English, Marathi, Bengali, Tamil, Telugu, Odia, and Assamese. BharatBBQ covers 13 social categories, including 3 intersectional groups, reflecting prevalent biases in the Indian sociocultural landscape. Our dataset contains 49,108 examples in one language that are expanded using translation and verification to 392,864 examples in eight different languages. We evaluate five multilingual LM families across zero- and few-shot settings, analyzing their bias and stereotypical bias scores. Our findings highlight persistent biases across languages and social categories and often amplified biases in Indian languages compared to English, demonstrating the necessity of linguistically and culturally grounded benchmarks for bias evaluation.

From Recall to Creation: Generating Follow-Up Questions Using Bloom’s Taxonomy and Grice’s Maxims
Archana Yadav | Harshvivek Kashid | Medchalimi Sruthi | B JayaPrakash | Chintalapalli Raja Kullayappa | Mandala Jagadeesh Reddy | Pushpak Bhattacharyya
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)

In-car AI assistants enhance driving by enabling hands-free interactions, yet they often struggle with multi-turn conversations and fail to handle cognitively complex follow-up questions. This limits their effectiveness in real-world deployment. To address this limitation, we propose a framework that leverages Bloom’s Taxonomy to systematically generate follow-up questions with increasing cognitive complexity and a Gricean-inspired evaluation framework to assess their Logical Consistency, Informativeness, Relevance, and Clarity. We introduce a dataset comprising 750 human-annotated seed questions and 3750 follow-up questions, with human evaluation confirming that 96.68% of the generated questions adhere to the intended Bloom’s Taxonomy levels. Our approach, validated through both LLM-based and human assessments, also identifies the specific cognitive complexity level at which in-car AI assistants begin to falter information that can help developers measure and optimize key cognitive aspects of conversational performance.

Why We Feel What We Feel: Joint Detection of Emotions and Their Opinion Triggers in E-commerce
Arnav Attri | Anuj Attri | Suman Banerjee | Amey Patil | Muthusamy Chelliah | Nikesh Garera | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: EMNLP 2025

Customer reviews on e-commerce platforms capture critical affective signals that drive purchasing decisions. However, no existing research has explored the joint task of emotion detection and explanatory span identification in e-commerce reviews - a crucial gap in understanding what triggers customer emotional responses. To bridge this gap, we propose a novel joint task unifying Emotion detection and Opinion Trigger extraction (EOT), which explicitly models the relationship between causal text spans (opinion triggers) and affective dimensions (emotion categories) grounded in Plutchik’s theory of 8 primary emotions.In the absence of labeled data, we introduce EOT-X, a human-annotated collection of 2,400 reviews with fine-grained emotions and opinion triggers. We evaluate 23 Large Language Models (LLMs) and present EOT-DETECT, a structured prompting framework with systematic reasoning and self-reflection. Our framework surpasses zero-shot and chain-of-thought techniques, across e-commerce domains.

Grahak-Nyay: Consumer Grievance Redressal through Large Language Models
Shrey Ganatra | Swapnil Bhattacharyya | Harshvivek Kashid | Spandan Anaokar | Shruthi N Nair | Reshma Sekhar | Siddharth Manohar | Rahul Hemrajani | Pushpak Bhattacharyya
Proceedings of the 1st Workshop on NLP for Empowering Justice (JUST-NLP 2025)

Access to consumer grievance redressal in India is often hindered by procedural complexity, legal jargon, and jurisdictional challenges. To address this, we present Grahak-Nyay (Justice-to-Consumers), a chatbot that streamlines the process using open-source Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG). Grahak-Nyay simplifies legal complexities through a concise and up-to-date knowledge base. We introduce three novel datasets: GeneralQA (general consumer law), SectoralQA (sector-specific knowledge) and SyntheticQA (for RAG evaluation), along with NyayChat, a dataset of 303 annotated chatbot conversations. We also introduce Judgments data sourced from Indian Consumer Courts to aid the chatbot in decision making and to enhance user trust. We also propose HAB metrics (Helpfulness, Accuracy, Brevity) to evaluate chatbot performance. Legal domain experts validated Grahak-Nyay’s effectiveness. Code and datasets are available at https://github.com/ShreyGanatra/GrahakNyay.git.

Recon, Answer, Verify: Agents in Search of Truth
Satyam Shukla | Himanshu Dutta | Pushpak Bhattacharyya
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Human fact-checking is too slow to meet current demands, making automatic fact-checking system an essential alternative. Evaluating such systems is challenging as existing benchmark datasets either suffer from leakage or evidence incompleteness. This limits the realism of current evaluations. We present Politi-Fact-Only (PFO), a 5-class benchmark dataset of 2,982 political claims from politifact.com, where all post-claim analysis and annotator cues have been removed manually from evidence article. After filtration, evidence contains information available prior to the claim’s verification. By evaluating PFO, we see an average performance drop of 11.39% in terms of macro-f1 compared to PFO’s unfiltered version. Based on the identified challenges of the existing LLM-based fact-checking system, we propose RAV (Recon-Answer-Verify), an agentic framework with three agents, it iteratively generates and answers sub-questions to verify different aspects of the claim before finally generating the label. Unlike prior literature, we worked on reducing the follow-up question complexity by leveraging two 2 types of structured questions, which either validate a fact or inquire about a fact. RAV generalizes across both domains and label granularities, outperforming state-of-the-art methods by 57.5% on PFO (political, 5-class) and by 3.05% on the widely used HOVER dataset (encyclopedic, 2-class).

Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Kentaro Inui | Sakriani Sakti | Haofen Wang | Derek F. Wong | Pushpak Bhattacharyya | Biplab Banerjee | Asif Ekbal | Tanmoy Chakraborty | Dhirendra Pratap Singh
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Looking Beyond the Pixels: Evaluating Visual Metaphor Understanding in VLMs
Manishit Kundu | Sumit Shekhar | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: EMNLP 2025

Visual metaphors are a complex vision–language phenomenon that requires both perceptual and conceptual reasoning to understand. They provide a valuable test of a model’s ability to interpret visual input and reason about it with creativity and coherence. We introduce ImageMet, a visual metaphor dataset, featuring 2177 synthetic and 350 human-annotated images. We benchmark several SOTA VLMs on two tasks: Visual Metaphor Captioning (VMC) and Visual Metaphor VQA (VM-VQA). We establish strong baselines by fine-tuning on ImageMet, which yields substantial performance gains in VMC (+4.67% SBERT-Similarity, +4.84% task-specific metric) and VM-VQA (+9.3% Accuracy on average). Additionally, we introduce a task-specific CoT prompting strategy that outperforms standard few-shot baselines (+1.99% in VMC, +5.21% in VM-VQA). We observe that despite strong performance on the VMC task, VLMs still significantly lag behind humans in understanding visual metaphors, indicating that their success often relies on learned associations rather than genuine analytical reasoning. We note that this gap is often obscured in metaphor captioning tasks where the automatic metrics correlate only moderately at best with human judgment (Pearson r < 0.6), highlighting the need for careful, holistic evaluation of the visual metaphor understanding of the models.

HalluDetect: Detecting, Mitigating, and Benchmarking Hallucinations in Conversational Systems in the Legal Domain
Spandan Anaokar | Shrey Ganatra | Harshvivek Kashid | Swapnil Bhattacharyya | Shruthi Nair | Reshma Sekhar | Siddharth Manohar | Rahul Hemrajani | Pushpak Bhattacharyya
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Large Language Models (LLMs) are widely used in industry but remain prone to hallucinations, limiting their reliability in critical applications. This work addresses hallucination reduction in consumer grievance chatbots built using LLaMA 3.1 8B Instruct, a compact model frequently used in industry. We develop **HalluDetect**, an LLM-based hallucination detection system that achieves an F1 score of **68.92%** outperforming baseline detectors by **22.47%**. Benchmarking five hallucination mitigation architectures, we find that out of them, AgentBot minimizes hallucinations to **0.4159** per turn while maintaining the highest token accuracy (**96.13%**), making it the most effective mitigation strategy. Our findings provide a scalable framework for hallucination mitigation, demonstrating that optimized inference strategies can significantly improve factual accuracy.

ReDepress: A Cognitive Framework for Detecting Depression Relapse from Social Media
Aakash Kumar Agarwal | Saprativa Bhattacharjee | Mauli Rastogi | Jemima S. Jacob | Biplab Banerjee | Rashmi Gupta | Pushpak Bhattacharyya
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Almost 50% depression patients face the risk of going into relapse. The risk increases to 80% after the second episode of depression. Although, depression detection from social media has attained considerable attention, depression relapse detection has remained largely unexplored due to the lack of curated datasets and the difficulty of distinguishing relapse and non-relapse users. In this work, we present ReDepress, the first clinically validated social media dataset focused on relapse, comprising 204 Reddit users annotated by mental health professionals. Unlike prior approaches, our framework draws on cognitive theories of depression, incorporating constructs such as attention bias, interpretation bias, memory bias and rumination into both annotation and modeling. Through statistical analyses and machine learning experiments, we demonstrate that cognitive markers significantly differentiate relapse and non-relapse groups, and that models enriched with these features achieve competitive performance, with transformer-based temporal models attaining an F1 of 0.86. Our findings validate psychological theories in real-world textual data and underscore the potential of cognitive-informed computational methods for early relapse detection, paving the way for scalable, low-cost interventions in mental healthcare.

IndiGEC: Multilingual Grammar Error Correction for Low-Resource Indian Languages
Ujjwal Sharma | Pushpak Bhattacharyya
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Grammatical Error Correction (GEC) for low-resource Indic languages faces significant challenges due to the scarcity of annotated data. In this work, we introduce the Mask-Translate&Fill (MTF) framework, a novel approach for generating high-quality synthetic data for GEC using only monolingual corpora. MTF leverages a machine translation system and a pretrained masked language model to introduce synthetic errors and tries to mimic errors made by second-language learners. Our experimental results on English, Hindi, Bengali, Marathi, and Tamil demonstrate that MTF consistently outperforms other monolingual synthetic data generation methods and achieves performance comparable to the Translation Language Modeling (TLM)-based approach, which uses a bilingual corpus, in both independent and multilingual settings. Under multilingual training, MTF yields significant improvements across Indic languages, with particularly notable gains in Bengali and Tamil, achieving +1.6 and +3.14 GLEU over the TLM-based method, respectively. To support further research, we also introduce the IndiGEC Corpus, a high-quality, human-written, manually validated GEC dataset for these four Indic languages, comprising over 8,000 sentence pairs with separate development and test splits.

Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Kentaro Inui | Sakriani Sakti | Haofen Wang | Derek F. Wong | Pushpak Bhattacharyya | Biplab Banerjee | Asif Ekbal | Tanmoy Chakraborty | Dhirendra Pratap Singh
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Hi-GEC: Hindi Grammar Error Correction in Low Resource Scenario
Ujjwal Sharma | Pushpak Bhattacharyya
Proceedings of the 31st International Conference on Computational Linguistics

Automated Grammatical Error Correction (GEC) has been extensively researched in Natural Language Processing (NLP), primarily focusing on English and other resource-rich languages. This paper shifts the focus to GEC for a scarcely explored low-resource language, specifically Hindi, which presents unique challenges due to its intricate morphology and complex syntax. To address data resource limitations, this work explores various GEC data generation techniques. Our research introduces a carefully extracted and filtered, high-quality dataset, HiWikiEdits, which includes human-edited 8,137 instances sourced from Wikipedia, encompassing 17 diverse grammatical error types, with annotations performed using the ERRANT toolkit. Furthermore, we investigate Round Trip Translation (RTT) using diverse languages for synthetic Hindi GEC data generation, revealing that leveraging high-resource linguistically distant language for error generation outperforms mid-resource linguistically closer languages. Specifically, using English as a pivot language resulted in a 6.25% improvement in GLEU score compared to using Assamese or Marathi. Finally, we also investigate the neural model-based synthetic error-generation technique and show that it achieves comparable performance to other synthetic data generation methods, even in low-resource settings.

AURA-QG: Automated Unsupervised Replicable Assessment for Question Generation
Rajshekar K | Harshad Khadilkar | Pushpak Bhattacharyya
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Question Generation (QG) is central to information retrieval, education, and knowledge assessment, yet its progress is bottlenecked by unreliable and non-scalable evaluation practices. Traditional metrics fall short in structured settings like document-grounded QG, and human evaluation, while insightful, remains expensive, inconsistent, and difficult to replicate at scale. We introduce AURA-QG: an Automated, Unsupervised, Replicable Assessment pipeline that scores question sets using only the source document. It captures four orthogonal dimensions i.e., answerability, non-redundancy, coverage, and structural entropy, without needing reference questions or relative baselines. Our method is modular, efficient, and agnostic to the question generation strategy. Through extensive experiments across four domains i.e., car manuals, economic surveys, health brochures, and fiction, we demonstrate its robustness across input granularities and prompting paradigms. Chain-of-Thought prompting, which first extracts answer spans and then generates targeted questions, consistently yields higher answerability and coverage, validating the pipeline’s fidelity. The metrics also exhibit strong agreement with human judgments, reinforcing their reliability for practical adoption. The complete implementation of our evaluation pipeline is publicly available.

CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
Bhavani Shankar | Preethi Jyothi | Pushpak Bhattacharyya
Proceedings of the 31st International Conference on Computational Linguistics

Code-switching is a widely prevalent linguistic phenomenon in multilingual societies like India. Building speech-to-text models for code-switched speech is challenging due to limited availability of datasets. In this work, we focus on the problem of spoken translation (ST) of code-switched speech in Indian languages to English text. We present a new end-to-end model architecture CoSTA that scaffolds on pretrained automatic speech recognition (ASR) and machine translation (MT) modules (that are more widely available for many languages). Speech and ASR text representations are fused using an aligned interleaving scheme and are fed further as input to a pretrained MT module; the whole pipeline is then trained end-to-end for spoken translation using synthetically created ST data. We also release a new evaluation benchmark for code-switched Bengali- English, Hindi-English, Marathi-English and Telugu-English speech to English text. CoSTA significantly outperforms many competitive cascaded and end-to-end multimodal baselines by up to 3.5 BLEU points.

From Perception to Reasoning: Enhancing Vision-Language Models for Mobile UI Understanding
Settaluri Lakshmi Sravanthi | Ankit Mishra | Debjyoti Mondal | Subhadarshi Panda | Rituraj Singh | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: ACL 2025

Accurately grounding visual and textual elements within mobile user interfaces (UIs) remains a significant challenge for Vision-Language Models (VLMs). Visual grounding, a critical task in this domain, involves identifying the most relevant UI element or region based on a natural language query—a process that requires both precise perception and context-aware reasoning. In this work, we present - **MoUI**, a light-weight mobile UI understanding model trained on **MoIT**, an instruction-tuning dataset specifically tailored for mobile screen understanding and grounding, designed to bridge the gap between user intent and visual semantics. Complementing this dataset, we also present a human-annotated reasoning benchmark **MoIQ** that rigorously evaluates complex inference capabilities over mobile UIs. To harness these resources effectively, we propose a two-stage training approach that separately addresses perception and reasoning tasks, leading to stronger perception capabilities and improvement in reasoning abilities. Through extensive experiments, we demonstrate that our MoUI models achieve significant gains in accuracy across all perception tasks and _state-of-the-art_ results on public reasoning benchmark **ComplexQA (78%) and our MoIQ (49%)**. We will be open-sourcing our dataset, code, and models to foster further research and innovation in the field.

Main Predicate and Their Arguments as Explanation Signals For Intent Classification
Sameer Pimparkhede | Pushpak Bhattacharyya
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Intent classification is crucial for conversational agents (chatbots), and deep learning models perform well in this area. However, little research has been done on the explainability of intent classification due to the absence of suitable benchmark data. Human annotation of explanation signals in text samples is time-consuming and costly. However, from inspection of data on intent classification, we see that, more often than not, the main verb denotes the action, and the direct object indicates the domain of conversation, serving as explanation signals for intent. This observation enables us to hypothesize that the main predicate in the text utterances, along with the arguments of the main predicate, can serve as explanation signals. Leveraging this, we introduce a new technique to automatically augment text samples from intent classification datasets with word-level explanations. We mark main predicates (primarily verbs) and their arguments (dependency relations) as explanation signals in benchmark intent classification datasets ATIS and SNIPS, creating a unique 21k-instance dataset for explainability. Further, we experiment with deep learning and language models. We observe that models that work well for classification do not perform well in explainability metrics like plausibility and faithfulness. We also observe that guiding models to focus on explanation signals from our dataset during training improves the plausibility Token F1 score by 3-4%, improving the model’s reasoning.

Refer to the Reference: Reference-focused Synthetic Automatic Post-Editing Data Generation
Sourabh Deoghare | Diptesh Kanojia | Pushpak Bhattacharyya
Proceedings of the 31st International Conference on Computational Linguistics

A prevalent approach to synthetic APE data generation uses source (src) sentences in a parallel corpus to obtain translations (mt) through an MT system and treats corresponding reference (ref) sentences as post-edits (pe). While effective, due to independence between ‘mt’ and ‘pe,’ these translations do not adequately reflect errors to be corrected by a human post-editor. Thus, we introduce a novel and simple yet effective reference-focused synthetic APE data generation technique that uses ‘ref’ instead of src’ sentences to obtain corrupted translations (mt_new). The experimental results across English-German, English-Russian, English-Marathi, English-Hindi, and English-Tamil language pairs demonstrate the superior performance of APE systems trained using the newly generated synthetic data compared to those trained using existing synthetic data. Further, APE models trained using a balanced mix of existing and newly generated synthetic data achieve improvements of 0.37, 0.19, 1.01, 2.42, and 2.60 TER points, respectively. We will release the generated synthetic APE data.

ConCodeEval: Evaluating Large Language Models for Code Constraints in Domain-Specific Languages
Mehant Kammakomati | Sameer Pimparkhede | Srikanth G. Tamilselvam | Prince Kumar | Pushpak Bhattacharyya
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)

System-level programming is essential for modern enterprise infrastructure, enabling the automation and management of complex systems through declarative code. Developers write this code based on schemas, which themselves are a form of code that defines constraints like data types and required fields. These schemas help ensure operational correctness and smooth integration across systems. However, as enterprise schemas become complex, manually writing code adhering to these constraints becomes challenging for developers. Large Language Models (LLMs) have demonstrated potential in code generation and natural language understanding, particularly in zero-shot and few-shot settings. However, applying LLMs to handle constraints represented in code, essential for system-level programming rather than natural language, has not been explored. Hence, we introduce ConCodeEval, a study across two key dimensions: format and constraint efficacy, with a first-of-its-kind benchmark involving two novel experiments for code constraints across five representations (JSON, YAML, XML, Python, and natural language). Our findings suggest that conscious choice of representations can lead to optimal use of LLMs in enterprise use cases involving constraints. Nonetheless, LLMs continue to struggle significantly with code constraints, motivating the need for innovation in this direction.

Nyay-Darpan: Enhancing Decision Making Through Summarization and Case Retrieval for Consumer Law in India
Swapnil Bhattacharyya | Harshvivek Kashid | Shrey Ganatra | Spandan Anaokar | Reshma Sekhar | Shruthi N Nair | Siddharth Manohar | Rahul Hemrajani | Pushpak Bhattacharyya
Proceedings of the 1st Workshop on NLP for Empowering Justice (JUST-NLP 2025)

AI-based judicial assistance and case prediction have been extensively studied in criminal and civil domains, but remain largely unexplored in consumer law, especially in India. In this paper, we present Nyay-Darpan, a novel two-in-one framework that (i) summarizes consumer case files and (ii) retrieves similar case judgements to aid decision-making in consumer dispute resolution. Our methodology not only addresses the gap in consumer law AI tools, but also introduces an innovative approach to evaluate the quality of the summary. The term ‘Nyay-Darpan’ translates into ‘Mirror of Justice’, symbolizing the ability of our tool to reflect the core of consumer disputes through precise summarization and intelligent case retrieval. Our system achieves over 75 percent precision in similar case prediction and approximately 70 percent accuracy across material summary evaluation metrics, demonstrating its practical effectiveness. We will publicly release the Nyay-Darpan framework and dataset to promote reproducibility and facilitate further research in this underexplored yet impactful domain.

GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation
Himanshu Dutta | Sunny Manchanda | Prakhar Bapat | Meva Ram Gurjar | Pushpak Bhattacharyya
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Enterprises, public organizations, and localization providers increasingly rely on Document-level Machine Translation (DocMT) to process contracts, reports, manuals, and multimedia transcripts across languages. However, existing MT systems often struggle to handle discourse-level phenomena such as pronoun resolution, lexical cohesion, and ellipsis, resulting in inconsistent or incoherent translations. We propose **GRAFT**, a modular graph-based DocMT framework that leverages Large Language Model (LLM) agents to segment documents into discourse units, infer inter-discourse dependencies, extract structured memory, and generate context-aware translations. GRAFT transforms documents into directed acyclic graphs (DAGs) to explicitly model translation flow and discourse structure. Experiments across eight language directions and six domains show GRAFT outperforms commercial systems (e.g., Google Translate) and closed LLMs (e.g., GPT-4) by an average of 2.8 d-BLEU, and improves terminology consistency and discourse handling. GRAFT supports deployment with open-source LLMs (e.g., LLaMA, Qwen), making it cost-effective and privacy-preserving. These results position GRAFT as a robust solution for scalable, document-level translation in real-world applications.

Divide, Link, and Conquer: Recall-oriented Schema Linking for NL-to-SQL via Question Decomposition
Kiran Pradeep | Kirushikesh Db | Nishtha Madaan | Sameep Mehta | Pushpak Bhattacharyya
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Natural language to SQL (NL-to-SQL) systems are increasingly critical in industry for enabling non-technical users to access structured data efficiently, supporting faster decision-making and data accessibility. However, state-of-the-art systems often depend on large proprietary models, which introduce serious concerns around privacy. While open-source LLMs offer a viable substitute, high-performing variants (e.g., 70B or 405B) require substantial GPU memory, making them impractical for many production environments. Smaller open-source models that fit on a single 80GB GPU present a more deployable alternative, yet existing efforts to enhance their Text-to-SQL performance rely heavily on fine-tuning, limiting flexibility. We propose RoSL, a plug-and-play framework that improves SQL generation for smaller LLMs without any task-specific training. While schema linking is often omitted for larger models, we show it remains essential for smaller ones. Further, we are the first to apply question decomposition at the schema linking stage, rather than during SQL generation as in prior work, to address the precision-recall tradeoff. Our approach improves schema linking recall by 25.1% and execution accuracy by 8.2% on the BIRD benchmark using ibm-granite/granite-3.3-8b-instruct, making it an effective and industry-friendly NL-to-SQL solution. We further analyze RoSL’s latency–efficiency characteristics, showing that it maintains practical efficiency for real-world deployment.

An introduction to computational identification and classification of Upamā alaṇkāra
Bhakti Jadhav | Himanshu Dutta | Shruti Kanitkar | Malhar Kulkarni | Pushpak Bhattacharyya
Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025

Giving the Old a Fresh Spin: Quality Estimation-Assisted Constrained Decoding for Automatic Post-Editing
Sourabh Deoghare | Diptesh Kanojia | Pushpak Bhattacharyya
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

Automatic Post-Editing (APE) systems often struggle with over-correction, where unnecessary modifications are made to a translation, diverging from the principle of minimal editing. In this paper, we propose a novel technique to mitigate over-correction by incorporating word-level Quality Estimation (QE) information during the decoding process. This method is architecture-agnostic, making it adaptable to any APE system, regardless of the underlying model or training approach. Our experiments on English-German, English-Hindi, and English-Marathi language pairs show the proposed approach yields significant improvements over their corresponding baseline APE systems, with TER gains of 0.65, 1.86, and 1.44 points, respectively. These results underscore the complementary relationship between QE and APE tasks and highlight the effectiveness of integrating QE information to reduce over-correction in APE systems.

StereoDetect: Detecting Stereotypes and Anti-stereotypes the Correct Way Using Social Psychological Underpinnings
Kaustubh Shivshankar Shejole | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: EMNLP 2025

Stereotypes are known to have very harmful effects, making their detection critically important. However, current research predominantly focuses on detecting and evaluating stereotypical biases, leaving the study of stereotypes in its early stages. Our study revealed that many works have failed to clearly distinguish between stereotypes and stereotypical biases, which has significantly slowed progress in advancing research in this area. Stereotype and Anti-stereotype detection is a problem that requires social knowledge; hence, it is one of the most difficult areas in Responsible AI. This work investigates this task, where we propose a five-tuple definition and provide precise terminologies disentangling stereotypes, anti‐stereotypes, stereotypical bias, and general bias. We provide a conceptual framework grounded in social psychology for reliable detection. We identify key shortcomings in existing benchmarks for this task of stereotype and anti-stereotype detection. To address these gaps, we developed *StereoDetect*, a well curated, definition‐aligned benchmark dataset designed for this task. We show that language models with fewer than 10 billion parameters frequently misclassify anti‐stereotypes and fail to recognize neutral overgeneralizations. We demonstrate StereoDetect’s effectiveness through multiple qualitative and quantitative comparisons with existing benchmarks and models fine-tuned on them.

MOD-KG: MultiOrgan Diagnosis Knowledge Graph
Anas Anwarul Haq Khan | Pushpak Bhattacharyya
NLP-AI4Health

The human body is highly interconnected, where a diagnosis in one organ can influence conditions in others. In medical research, graphs (such as Knowledge Graphs and Causal Graphs) have proven useful for capturing these relationships, but constructing them manually with expert input is both costly and time-intensive, especially given the continuous flow of new findings. To address this, we leverage the extraction capabilities of large language models (LLMs) to build the **MultiOrgan Diagnosis Knowledge Graph (MOD-KG)**. MOD-KG contains over **21,200 knowledge triples**, derived from both textbooks **(~13%)** and carefully selected research papers (with an average of **444** citations each). The graph focuses primarily on the *heart, lungs, kidneys, liver, pancreas, and brain*, which are central to much of today’s multimodal imaging research. The extraction quality of the LLM was benchmarked against baselines over **1000** samples, demonstrating reliability. We will make our dataset public upon acceptance.

ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries
Kishan Maharaj | Vitobha Munigala | Srikanth G. Tamilselvam | Prince Kumar | Sayandeep Sen | Palani Kodeswaran | Abhijit Mishra | Pushpak Bhattacharyya
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent advancements in large language models (LLMs) have significantly enhanced their ability to understand both natural language and code, driving their use in tasks like natural language-to-code (NL2Code) and code summarisation. However, LLMs are prone to hallucination—outputs that stray from intended meanings. Detecting hallucinations in code summarisation is especially difficult due to the complex interplay between programming and natural languages. We introduce a first-of-its-kind dataset, CodeSumEval, with ~10K samples, curated specifically for hallucination detection in code summarisation. We further propose a novel Entity Tracing Framework (ETF) that a) utilises static program analysis to identify code entities from the program and b) uses LLMs to map and verify these entities and their intents within generated code summaries. Our experimental analysis demonstrates the framework’s effectiveness, leading to a 73% F1 score. The proposed approach provides a method for detecting hallucinations by tracing entities from the summary to the code, allowing us to evaluate summary accuracy and localise the error within the summary.

Video-guided Machine Translation: A Survey of Models, Datasets, and Challenges
Pinaki Das | Virendra Singh | Pushpak Bhattacharyya | Gholamreza Haffari
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

In recent years, machine translation has evolved with the integration of multimodal information. Infusion of multi-modality into translation tasks decreases ambiguation and enhances translation scores. Common modalities include images, speech, and videos, which provide additional context alongside the text to be translated. While multimodal translation with images has been extensively studied, video-guided machine translation (VMT) has gained increasing attention, particularly since Wang et al. 2019 first explored this task. In this paper, we provide a comprehensive overview of VMT, highlighting its unique challenges, methodologies, and recent advancements. Unlike previous surveys that primarily focus on image-guided multimodal machine translation, this work explores the distinct complexities and opportunities introduced by adding video as a modality to the translation task.

Understand the Implication: Learning to Think for Pragmatic Understanding
Settaluri Lakshmi Sravanthi | Kishan Maharaj | Sravani Gunnu | Abhijit Mishra | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: ACL 2025

Pragmatics, the ability to infer meaning beyond literal interpretation, is crucial for social cognition and communication. While LLMs have been benchmarked for their pragmatic understanding, improving their performance remains underexplored. Existing methods rely on annotated labels but overlook the reasoning process humans naturally use to interpret implicit meaning. To bridge this gap, we introduce a novel pragmatic dataset ImpliedMeaningPreference that includes explicit reasoning (‘thoughts’) for both correct and incorrect interpretations. Through preference-tuning and supervised fine-tuning, we demonstrate that thought-based learning significantly enhances LLMs’ pragmatic understanding, improving accuracy by 11.12% across model families. We further discuss a transfer-learning study where we evaluate the performance of thought-based training for the other tasks of pragmatics (presupposition, deixis) that are not seen during the training time and observe an improvement of 16.10% compared to label trained models.

Stereotype Detection as a Catalyst for Enhanced Bias Detection: A Multi-Task Learning Approach
Aditya Tomar | Rudra Murthy | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: ACL 2025

Bias and stereotypes in language models can cause harm, especially in sensitive areas like content moderation and decision-making. This paper addresses bias and stereotype detection by exploring how jointly learning these tasks enhances model performance. We introduce StereoBias, a unique dataset labeled for bias and stereotype detection across five categories: religion, gender, socio-economic status, race, profession, and others, enabling a deeper study of their relationship. Our experiments compare encoder-only models and fine-tuned decoder-only models using QLoRA. While encoder-only models perform well, decoder-only models also show competitive results. Crucially, joint training on bias and stereotype detection significantly improves bias detection compared to training them separately. Additional experiments with sentiment analysis confirm that the improvements stem from the connection between bias and stereotypes, not multi-task learning alone. These findings highlight the value of leveraging stereotype information to build fairer and more effective AI systems.

“You are Beautiful, Body Image Stereotypes are Ugly!” BIStereo: A Benchmark to Measure Body Image Stereotypes in Language Models
Narjis Asad | Nihar Ranjan Sahoo | Rudra Murthy | Swaprava Nath | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: ACL 2025

While a few high-quality bias benchmark datasets exist to address stereotypes in Language Models (LMs), a notable lack of focus remains on body image stereotypes. To bridge this gap, we propose BIStereo, a suite to uncover LMs’ biases towards people of certain physical appearance characteristics, namely, skin complexion, body shape, height, attire, and a miscellaneous category including hair texture, eye color, and more. Our dataset comprises 40k sentence pairs designed to assess LMs’ biased preference for certain body types. We further include 60k premise-hypothesis pairs designed to comprehensively assess LMs’ preference for fair skin tone. Additionally, we curate 553 tuples consisting of a body image descriptor, gender, and a stereotypical attribute, validated by a diverse pool of annotators for physical appearance stereotypes.We propose a metric, TriSentBias, that captures the biased preferences of LMs towards a certain body type over others. Using BIStereo, we assess the presence of body image biases in ten different language models, revealing significant biases in models Muril, XLMR, Llama3, and Gemma. We further evaluate the LMs through downstream NLI and Analogy tasks.Our NLI experiments highlight notable patterns in the LMs that align with the well-documented cognitive bias in humans known as the Halo Effect.

Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic Languages
Poulami Ghosh | Raj Dabre | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: NAACL 2025

Pre-trained language models (PLMs) are known to be susceptible to perturbations to the input text, but existing works do not explicitly focus on linguistically grounded attacks, which are subtle and more prevalent in nature. In this paper, we study whether PLMs are agnostic to linguistically grounded attacks or not. To this end, we offer the first study addressing this, investigating different Indic languages and various downstream tasks. Our findings reveal that although PLMs are susceptible to linguistic perturbations, when compared to non-linguistic attacks, PLMs exhibit a slightly lower susceptibility to linguistic attacks. This highlights that even constrained attacks are effective. Moreover, we investigate the implications of these outcomes across a range of languages, encompassing diverse language families and different scripts.

Looks can be Deceptive: Distinguishing Repetition Disfluency from Reduplication
Arif A. Ahmad | Khyathi Gayathri Mothika | Pushpak Bhattacharyya
Proceedings of the 31st International Conference on Computational Linguistics

Reduplication and repetition, though similar in form, serve distinct linguistic purposes. Reduplication is a deliberate morphological process used to express grammatical, semantic, or pragmatic nuances, while repetition is often unintentional and indicative of disfluency. This paper presents the first large-scale study of reduplication and repetition in speech using computational linguistics. We introduce IndicRedRep, a new publicly available dataset containing Hindi, Telugu, and Marathi text annotated with reduplication and repetition at the word level. We evaluate transformer-based models for multi-class reduplication and repetition token classification, utilizing the Reparandum-Interregnum-Repair structure to distinguish between the two phenomena. Our models achieve macro F1 scores of up to 85.62% in Hindi, 83.95% in Telugu, and 84.82% in Marathi for reduplication-repetition classification.

“My life is miserable, have to sign 500 autographs everyday”: Exposing Humblebragging, the Brags in Disguise
Sharath Naganna | Saprativa Bhattacharjee | Biplab Banerjee | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: ACL 2025

Humblebragging is a phenomenon in which individuals present self-promotional statements under the guise of modesty or complaints. For example, a statement like, “Ugh, I can’t believe I got promoted to lead the entire team. So stressful!”, subtly highlights an achievement while pretending to be complaining. Detecting humblebragging is important for machines to better understand the nuances of human language, especially in tasks like sentiment analysis and intent recognition. However, this topic has not yet been studied in computational linguistics. For the first time, we introduce the task of automatically detecting humblebragging in text. We formalize the task by proposing a 4-tuple definition of humblebragging and evaluate machine learning, deep learning, and large language models (LLMs) on this task, comparing their performance with humans. We also create and release a dataset called HB-24, containing 3,340 humblebrags generated using GPT-4o. Our experiments show that detecting humblebragging is non-trivial, even for humans. Our best model achieves an F1-score of 0.88. This work lays the foundation for further exploration of this nuanced linguistic phenomenon and its integration into broader natural language understanding systems.

RG-VQA: Leveraging Retriever-Generator Pipelines for Knowledge Intensive Visual Question Answering
Settaluri Lakshmi Sravanthi | Pulkit Agarwal | Debjyoti Mondal | Rituraj Singh | Subhadarshi Panda | Ankit Mishra | Kiran Pradeep | Srihari K B | Godawari Sudhakar Rao | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: EMNLP 2025

In this paper, we propose a method to improve the reasoning capabilities of Visual Question Answering (VQA) systems by integrating Dense Passage Retrievers (DPRs) with Vision Language Models (VLMs). While recent works focus on the application of knowledge graphs and chain-of-thought reasoning, we recognize that the complexity of graph neural networks and end-to-end training remain significant challenges. To address these issues, we introduce **R**elevance **G**uided **VQA** (**RG-VQA**), a retriever-generator pipeline that uses DPRs to efficiently extract relevant information from structured knowledge bases. Our approach ensures scalability to large graphs without significant computational overhead. Experiments on the ScienceQA dataset show that RG-VQA achieves state-of-the-art performance, surpassing human accuracy and outperforming GPT-4 by more than . This demonstrates the effectiveness of RG-VQA in boosting the reasoning capabilities of VQA systems and its potential for practical applications.

2024

RoundTripOCR: A Data Generation Technique for Enhancing Post-OCR Error Correction in Low-Resource Devanagari Languages
Harshvivek Kashid | Pushpak Bhattacharyya
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

Optical Character Recognition (OCR) technology has revolutionized the digitization of printed text, enabling efficient data extraction and analysis across various domains. Just like Machine Translation systems, OCR systems are prone to errors. In this work, we address the challenge of data generation and post-OCR error correction, specifically for low-resource languages. We propose an approach for synthetic data generation for Devanagari languages, RoundTripOCR, that tackles the scarcity of the post-OCR Error Correction datasets for low-resource languages. We release post-OCR text correction datasets for Hindi, Marathi, Bodo, Nepali, Konkani and Sanskrit. We also present a novel approach for OCR error correction by leveraging techniques from machine translation. Our method involves translating erroneous OCR output into a corrected form by treating the OCR errors as mistranslations in a parallel text corpus, employing pre-trained transformer models to learn the mapping from erroneous to correct text pairs, effectively correcting OCR errors.

Natural Answer Generation: From Factoid Answer to Full-length Answer using Grammar Correction
Manas Jain | Sriparna Saha | Pushpak Bhattacharyya | Gladvin Chinnadurai | Manish Vatsa
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

Question Answering systems these days typically use template-based language generation. Though adequate for a domain-specific task, these systems are too restrictive and predefined for domain-independent systems. This paper proposes a system that outputs a full-length answer given a question and the extracted factoid answer (short spans such as named entities) as the input. Our system uses constituency and dependency parse trees of questions. A transformer-based Grammar Error Correction model GECToR is used as a post-processing step for better fluency. We compare our system with (i) a Modified Pointer Generator (SOTA) and (ii) Fine-tuned DialoGPT for factoid questions. We also tested our approach on existential (yes-no) questions with better results. Our model generates more accurate and fluent answers than the state-of-the-art (SOTA) approaches. The evaluation is done on NewsQA and SqUAD datasets with an increment of 0.4 and 0.9 percentage points in ROUGE-1 score respectively. Also, the inference time is reduced by 85% compared to the SOTA. The improved datasets used for our evaluation will be released as part of the research contribution.

IndicIRSuite: Multilingual Dataset and Neural Information Models for Indian Languages
Saiful Haq | Ashutosh Sharma | Omar Khattab | Niyati Chhaya | Pushpak Bhattacharyya
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In this paper, we introduce Neural Information Retrieval resources for 11 widely spoken Indian Languages (Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu) from two major Indian language families (Indo-Aryan and Dravidian). These resources include (a) INDIC-MARCO, a multilingual version of the MS MARCO dataset in 11 Indian Languages created using Machine Translation, and (b) Indic-ColBERT, a collection of 11 distinct Monolingual Neural Information Retrieval models, each trained on one of the 11 languages in the INDIC-MARCO dataset. To the best of our knowledge, IndicIRSuite is the first attempt at building large-scale Neural Information Retrieval resources for a large number of Indian languages, and we hope that it will help accelerate research in Neural IR for Indian Languages. Experiments demonstrate that Indic-ColBERT achieves 47.47% improvement in the MRR@10 score averaged over the INDIC-MARCO baselines for all 11 Indian languages except Oriya, 12.26% improvement in the NDCG@10 score averaged over the MIRACL Bengali and Hindi Language baselines, and 20% improvement in the MRR@100 Score over the Mr. Tydi Bengali Language baseline.

In-context Mixing (ICM): Code-mixed Prompts for Multilingual LLMs
Bhavani Shankar | Preethi Jyothi | Pushpak Bhattacharyya
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We introduce a simple and effective prompting technique called in-context mixing (ICM) for effective in-context learning (ICL) with multilingual large language models (MLLMs). With ICM, we modify the few-shot examples within ICL prompts to be intra-sententially code-mixed by randomly swapping content words in the target languages with their English translations. We observe that ICM prompts yield superior performance in NLP tasks such as disfluency correction, grammar error correction and text simplification that demand a close correspondence between the input and output sequences. Significant improvements are observed mainly for low-resource languages that are under-represented during the pretraining and finetuning of MLLMs. We present an extensive set of experiments to analyze when ICM is effective and what design choices contribute towards its effectiveness. ICM works consistently and significantly better than other prompting techniques across models of varying capacity such as mT0-XXL, BloomZ and GPT-4.

One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation
Tejpalsingh Siledar | Swaroop Nath | Sankara Muddu | Rupasai Rangaraju | Swaprava Nath | Pushpak Bhattacharyya | Suman Banerjee | Amey Patil | Sudhanshu Singh | Muthusamy Chelliah | Nikesh Garera
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Evaluation of opinion summaries using conventional reference-based metrics often fails to provide a comprehensive assessment and exhibits limited correlation with human judgments. While Large Language Models (LLMs) have shown promise as reference-free metrics for NLG evaluation, their potential remains unexplored for opinion summary evaluation. Furthermore, the absence of sufficient opinion summary evaluation datasets hinders progress in this area. In response, we introduce the SUMMEVAL-OP dataset, encompassing 7 dimensions crucial to the evaluation of opinion summaries: fluency, coherence, relevance, faithfulness, aspect coverage, sentiment consistency, and specificity. We propose OP-I-PROMPT, a dimension-independent prompt, along with OP-PROMPTS, a dimension-dependent set of prompts for opinion summary evaluation. Our experiments demonstrate that OP-I-PROMPT emerges as a good alternative for evaluating opinion summaries, achieving an average Spearman correlation of 0.70 with human judgments, surpassing prior methodologies. Remarkably, we are the first to explore the efficacy of LLMs as evaluators, both on closed-source and open-source models, in the opinion summary evaluation domain.

We report the results of the WMT 2024 shared task on Quality Estimation, in which the challenge is to predict the quality of the output of neural machine translation systems at the word and sentence levels, without access to reference translations. In this edition, we expanded our scope to assess the potential for quality estimates to help in the correction of translated outputs, hence including an automated post-editing (APE) direction. We publish new test sets with human annotations that target two directions: providing new Multidimensional Quality Metrics (MQM) annotations for three multi-domain language pairs (English to German, Spanish and Hindi) and extending the annotations on Indic languages providing direct assessments and post edits for translation from English into Hindi, Gujarati, Tamil and Telugu. We also perform a detailed analysis of the behaviour of different models with respect to different phenomena including gender bias, idiomatic language, and numerical and entity perturbations. We received submissions based both on traditional, encoder-based approaches as well as large language model (LLM) based ones.

MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention
Prince Jha | Raghav Jain | Konika Mandal | Aman Chadha | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In the digital world, memes present a unique challenge for content moderation due to their potential to spread harmful content. Although detection methods have improved, proactive solutions such as intervention are still limited, with current research focusing mostly on text-based content, neglecting the widespread influence of multimodal content like memes. Addressing this gap, we present MemeGuard, a comprehensive framework leveraging Large Language Models (LLMs) and Visual Language Models (VLMs) for meme intervention. MemeGuard harnesses a specially fine-tuned VLM, VLMeme, for meme interpretation, and a multimodal knowledge selection and ranking mechanism (MKS) for distilling relevant knowledge. This knowledge is then employed by a general-purpose LLM to generate contextually appropriate interventions. Another key contribution of this work is the Intervening Cyberbullying in Multimodal Memes (ICMM) dataset, a high-quality, labeled dataset featuring toxic memes and their corresponding human-annotated interventions. We leverage ICMM to test MemeGuard, demonstrating its proficiency in generating relevant and effective responses to toxic memes. red Disclaimer: This paper contains harmful content that may be disturbing to some readers.

Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models
Manas Jhalani | Annervaz K M | Pushpak Bhattacharyya
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

In the realm of multimodal tasks, Visual Question Answering (VQA) plays a crucial role by addressing natural language questions grounded in visual content. Knowledge-Based Visual Question Answering (KBVQA) advances this concept by adding external knowledge along with images to respond to questions. We introduce an approach for KBVQA, augmenting the existing vision-language transformer encoder-decoder (OFA) model . Our main contribution involves enhancing questions by incorporating relevant external knowledge extracted from knowledge graphs, using a dynamic triple extraction

ToxVidLM: A Multimodal Framework for Toxicity Detection in Code-Mixed Videos
Krishanu Maity | A.S. Poornash | Sriparna Saha | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: ACL 2024

In an era of rapidly evolving internet technology, the surge in multimodal content, including videos, has expanded the horizons of online communication. However, the detection of toxic content in this diverse landscape, particularly in low-resource code-mixed languages, remains a critical challenge. While substantial research has addressed toxic content detection in textual data, the realm of video content, especially in non-English languages, has been relatively underexplored. This paper addresses this research gap by introducing a benchmark dataset, the first of its kind, consisting of 931 videos with 4021 code-mixed Hindi-English utterances collected from YouTube. Each utterance within this dataset has been meticulously annotated for toxicity, severity, and sentiment labels. We have developed an advanced Multimodal Multitask framework built for Toxicity detection in Video Content by leveraging Language Models (LMs), crafted for the primary objective along with the additional tasks of conducting sentiment and severity analysis. ToxVidLM incorporates three key modules – the Encoder module, Cross-Modal Synchronization module, and Multitask module – crafting a generic multimodal LM customized for intricate video classification tasks. Our experiments reveal that incorporating multiple modalities from the videos substantially enhances the performance of toxic content detection by achieving an Accuracy and Weighted F1 score of 94.29% and 94.35%, respectively.

IndiFoodVQA: Advancing Visual Question Answering and Reasoning with a Knowledge-Infused Synthetic Data Generation Pipeline
Pulkit Agarwal | Settaluri Sravanthi | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: EACL 2024

Large Vision Language Models (VLMs) like GPT-4, LLaVA, and InstructBLIP exhibit extraordinary capabilities for both knowledge understanding and reasoning. However, the reasoning capabilities of such models on sophisticated problems that require external knowledge of a specific domain have not been assessed well, due to the unavailability of necessary datasets. In this work, we release a first-of-its-kind dataset called IndiFoodVQA with around 16.7k data samples, consisting of explicit knowledge-infused questions, answers, and reasons. We also release IndiFoodKG, a related Knowledge Graph (KG) with 79k triples. The data has been created with minimal human intervention via an automated pipeline based on InstructBlip and GPT-3.5. We also present a methodology to extract knowledge from the KG and use it to both answer and reason upon the questions. We employ different models to report baseline zero-shot and fine-tuned results. Fine-tuned VLMs on our data showed an improvement of ~25% over the corresponding base model, highlighting the fact that current VLMs need domain-specific fine-tuning to excel in specialized settings. Our findings reveal that (1) explicit knowledge infusion during question generation helps in making questions that have more grounded knowledge, and (2) proper knowledge retrieval can often lead to better-answering potential in such cases. The data and code is available at https://github.com/SLSravanthi/IndifoodVQA.

A Case Study on Context-Aware Neural Machine Translation with Multi-Task Learning
Ramakrishna Appicharla | Baban Gain | Santanu Pal | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)

In document-level neural machine translation (DocNMT), multi-encoder approaches are common in encoding context and source sentences. Recent studies (CITATION) have shown that the context encoder generates noise and makes the model robust to the choice of context. This paper further investigates this observation by explicitly modelling context encoding through multi-task learning (MTL) to make the model sensitive to the choice of context. We conduct experiments on cascade MTL architecture, which consists of one encoder and two decoders. Generation of the source from the context is considered an auxiliary task, and generation of the target from the source is the main task. We experimented with German–English language pairs on News, TED, and Europarl corpora. Evaluation results show that the proposed MTL approach performs better than concatenation-based and multi-encoder DocNMT models in low-resource settings and is sensitive to the choice of context. However, we observe that the MTL models are failing to generate the source from the context. These observations align with the previous studies, and this might suggest that the available document-level parallel corpora are not context-aware, and a robust sentence-level model can outperform the context-aware models.

Product Description and QA Assisted Self-Supervised Opinion Summarization
Tejpalsingh Siledar | Rupasai Rangaraju | Sankara Muddu | Suman Banerjee | Amey Patil | Sudhanshu Singh | Muthusamy Chelliah | Nikesh Garera | Swaprava Nath | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: NAACL 2024

In e-commerce, opinion summarization is the process of summarizing the consensus opinions found in product reviews. However, the potential of additional sources such as product description and question-answers (QA) has been considered less often. Moreover, the absence of any supervised training data makes this task challenging. To address this, we propose a novel synthetic dataset creation (SDC) strategy that leverages information from reviews as well as additional sources for selecting one of the reviews as a pseudo-summary to enable supervised training. Our Multi-Encoder Decoder framework for Opinion Summarization (MEDOS) employs a separate encoder for each source, enabling effective selection of information while generating the summary. For evaluation, due to the unavailability of test sets with additional sources, we extend the Amazon, Oposum+, and Flipkart test sets and leverage ChatGPT to annotate summaries. Experiments across nine test sets demonstrate that the combination of our SDC approach and MEDOS model achieves on average a 14.5% improvement in ROUGE-1 F1 over the SOTA. Moreover, comparative analysis underlines the significance of incorporating additional sources for generating more informative summaries. Human evaluations further indicate that MEDOS scores relatively higher in coherence and fluency with 0.41 and 0.5 (−1 to 1) respectively, compared to existing models. To the best of our knowledge, we are the first to generate opinion summaries leveraging additional sources in a self-supervised setting.

Striking a Balance between Classical and Deep Learning Approaches in Natural Language Processing Pedagogy
Aditya Joshi | Jake Renzella | Pushpak Bhattacharyya | Saurav Jha | Xiangyu Zhang
Proceedings of the Sixth Workshop on Teaching NLP

While deep learning approaches represent the state-of-the-art of natural language processing (NLP) today, classical algorithms and approaches still find a place in NLP textbooks and courses of recent years. This paper discusses the perspectives of conveners of two introductory NLP courses taught in Australia and India, and examines how classical and deep learning approaches can be balanced within the lecture plan and assessments of the courses. We also draw parallels with the objects-first and objects-later debate in CS1 education. We observe that teaching classical approaches adds value to student learning by building an intuitive understanding of NLP problems, potential solutions, and even deep learning models themselves. Despite classical approaches not being state-of-the-art, the paper makes a case for their inclusion in NLP courses today.

Mental Disorder Classification via Temporal Representation of Text
Raja Kumar | Kishan Maharaj | Ashita Saxena | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: EMNLP 2024

Mental disorders pose a global challenge, aggravated by the shortage of qualified mental health professionals. Mental disorder prediction from social media posts by current LLMs is challenging due to the complexities of sequential text data and the limited context length of language models. Current language model-based approaches split a single data instance into multiple chunks to compensate for limited context size. The predictive model is then applied to each chunk individually, and the most voted output is selected as the final prediction. This results in the loss of inter-post dependencies and important time variant information, leading to poor performance. We propose a novel framework which first compresses the large sequence of chronologically ordered social media posts into a series of numbers. We then use this time variant representation for mental disorder classification. We demonstrate the generalization capabilities of our framework by outperforming the current SOTA in three different mental conditions: depression, self-harm, and anorexia, by an absolute improvement of 5% in the F1 score. We also investigate the situation when current data instances fall within the context length of language models and present empirical results highlighting the importance of temporal properties of textual data. Furthermore, we utilize the proposed framework for a cross-domain study, exploring commonalities across disorders and the possibility of inter-domain data usage.

SansGPT: Advancing Generative Pre-Training in Sanskrit
Rhugved Pankaj Chaudhari | Bhakti Jadhav | Pushpak Bhattacharyya | Malhar Kulkarni
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

In the past decade, significant progress has been made in digitizing Sanskrit texts and advancing computational analysis of the language. However, efforts to advance NLP for complex semantic downstream tasks like Semantic Analogy Prediction, Named Entity Recognition, and others remain limited. This gap is mainly due to the absence of a robust, pre-trained Sanskrit model built on large-scale Sanskrit text data since this demands considerable computational resources and data preparation. In this paper, we introduce SansGPT, a generative pre-trained model that has been trained on a large corpus of Sanskrit texts and is designed to facilitate fine-tuning and development for downstream NLP tasks. We aim for this model to serve as a catalyst for advancing NLP research in Sanskrit. Additionally, we developed a custom tokenizer specifically optimized for Sanskrit text, enabling effective tokenization of compound words and making it better suited for generative tasks. Our data collection and cleaning process encompassed a wide array of available Sanskrit literature, ensuring comprehensive representation for training. We further demonstrate the model’s efficacy by fine-tuning it on Semantic Analogy Prediction and Simile Element Extraction, achieving an impressive accuracy of approximately 95.8% and 92.8%, respectively.

Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages
Sourabh Deoghare | Diptesh Kanojia | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: EMNLP 2024

This exploratory study investigates the potential of multilingual Automatic Post-Editing (APE) systems to enhance the quality of machine translations for low-resource Indo-Aryan languages. Focusing on two closely related language pairs, English-Marathi and English-Hindi, we exploit the linguistic similarities to develop a robust multilingual APE model. To facilitate cross-linguistic transfer, we generate synthetic Hindi-Marathi and Marathi-Hindi APE triplets. Additionally, we incorporate a Quality Estimation (QE)-APE multi-task learning framework. While the experimental results underline the complementary nature of APE and QE, we also observe that QE-APE multitask learning facilitates effective domain adaptation. Our experiments demonstrate that the multilingual APE models outperform their corresponding English-Hindi and English-Marathi single-pair models by 2.5 and 2.39 TER points, respectively, with further notable improvements over the multilingual APE model observed through multi-task learning (+1.29 and +1.44 TER points), data augmentation (+0.53 and +0.45 TER points) and domain adaptation (+0.35 and +0.45 TER points). We release the synthetic data, code, and models accrued during this study publicly for further research.

RoMantra: Optimizing Neural Machine Translation for Low-Resource Languages through Romanization
Govind Soni | Pushpak Bhattacharyya
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

Neural Machine Translation (NMT) for low-resource language pairs with distinct scripts, such as Hindi-Chinese and Japanese-Hindi, poses significant challenges due to scriptural and linguistic differences. This paper investigates the efficacy of romanization as a preprocessing step to bridge these gaps. We compare baseline models trained on native scripts with models incorporating romanization in three configurations: both-side, source-side only, and target-side only. Additionally, we introduce a script restoration model that converts romanized output back to native scripts, ensuring accurate evaluation. Our experiments show that romanization, particularly when applied to both sides, improves translation quality across the studied language pairs. The script restoration model further enhances the practicality of this approach by enabling evaluation in native scripts with some performance loss. This work provides insights into leveraging romanization for NMT in low-resource, cross-script settings, presenting a promising direction for under-researched language combinations.

Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations
Prince Jha | Krishanu Maity | Raghav Jain | Apoorv Verma | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Internet memes have gained significant influence in communicating political, psychological, and sociocultural ideas. While meme are often humorous, there has been a rise in the use of memes for trolling and cyberbullying. Although a wide variety of effective deep learning-based models have been developed for detecting offensive multimodal memes, only a few works have been done on explainability aspect. Recent laws like “right to explanations” of General Data Protection Regulation, have spurred research in developing interpretable models rather than only focusing on performance. Motivated by this, we introduce MultiBully-Ex, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes. Here, both visual and textual modalities are highlighted to explain why a given meme is cyberbullying. A Contrastive Language-Image Pretraining (CLIP) projection based multimodal shared-private multitask approach has been proposed for visual and textual explanation of a meme. Experimental results demonstrate that training with multimodal explanations improves performance in generating textual justifications and more accurately identifying the visual evidence supporting a decision with reliable performance improvements.

DocCGen: Document-based Controlled Code Generation
Sameer Pimparkhede | Mehant Kammakomati | Srikanth G. Tamilselvam | Prince Kumar | Ashok Pon Kumar | Pushpak Bhattacharyya
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Recent developments show that Large Language Models (LLMs) produce state-of-the-art performance on natural language (NL) to code generation for resource-rich general-purpose languages like C++, Java, and Python. However, their practical usage for structured domain-specific languages (DSLs) such as YAML, JSON is limited due to domain-specific schema, grammar, and customizations generally unseen by LLMs during pre-training. Efforts have been made to mitigate this challenge via in-context learning through relevant examples or by fine-tuning. However, it suffers from problems, such as limited DSL samples and prompt sensitivity but enterprises maintain good documentation of the DSLs. Therefore, we propose DocCGen, a framework that can leverage such rich knowledge by breaking the NL-to-Code generation task for structured code languages into a two-step process. First, it detects the correct libraries using the library documentation that best matches the NL query. Then, it utilizes schema rules extracted from the documentation of these libraries to constrain the decoding. We evaluate our framework for two complex structured languages, Ansible YAML and Bash command, consisting of two settings: Out-of-domain (OOD) and In domain (ID). Our extensive experiments show that DocCGen consistently improves different sized language models across all six evaluation metrics, reducing syntactic and semantic errors in structured code.

Addressing Bias and Hallucination in Large Language Models
Nihar Ranjan Sahoo | Ashita Saxena | Kishan Maharaj | Arif A. Ahmad | Abhijit Mishra | Pushpak Bhattacharyya
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024): Tutorial Summaries

In the landscape of natural language processing (NLP), addressing the challenges of bias and hallucination is paramount to ensuring the ethical and unbiased development of Large Language Models (LLMs). This tutorial delves into the intricate dimensions of LLMs, shedding light on the critical importance of understanding and mitigating the profound impacts of bias and hallucination. Divided into two parts, the first part delves deep into the complexity of bias propagation in LLM development, where we dissect its origins and far-reaching impacts. We then present innovative methodologies for mitigating diverse forms of bias, including dynamic word embeddings and robust benchmarking strategies. The second part of the tutorial discusses hallucination - a prevalent issue in generative AI systems such as LLMs. Through advanced data-driven techniques, we decode its intricate effects and complexities, followed factually-driven mitigation strategies. Furthermore, we shed light on the pivotal role of human cognitive behavior in the context of hallucination, drawing insights from cognitive data, including human eye-tracking data. Ultimately, this cutting-edge tutorial serves as a guiding light, equipping participants with indispensable tools and insights to navigate the ethical complexities of LLMs, thus paving the way for the development of unbiased and ethically robust NLP systems.

Unveiling the Invisible: Captioning Videos with Metaphors
Abisek Rajakumar Kalarani | Pushpak Bhattacharyya | Sumit Shekhar
Findings of the Association for Computational Linguistics: EMNLP 2024

Metaphors are a common communication tool used in our day-to-day life. The detection and generation of metaphors in textual form have been studied extensively but metaphors in other forms have been under-explored. Recent studies have shown that Vision-Language (VL) models cannot understand visual metaphors in memes and adverts. As of now, no probing studies have been done that involve complex language phenomena like metaphors with videos. Hence, we introduce a new VL task of describing the metaphors present in the videos in our work. To facilitate this novel task, we construct and release a manually created dataset with 705 videos and 2115 human-written captions, along with a new metric called Average Concept Distance (ACD), to automatically evaluate the creativity of the metaphors generated. We also propose a novel low-resource video metaphor captioning system: GIT-LLaVA, which obtains comparable performance to SoTA video language models on the proposed task. We perform a comprehensive analysis of existing video language models on this task and publish our dataset, models, and benchmark results to enable further research.

We Care: Multimodal Depression Detection and Knowledge Infused Mental Health Therapeutic Response Generation
Palash Moon | Pushpak Bhattacharyya
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

The detection of depression through non-verbal cues has gained significant attention. Previous research predominantly centred on identifying depression within the confines of controlled laboratory environments, often with the supervision of psychologists or counsellors. Unfortunately, datasets generated in such controlled settings may struggle to account for individual behaviours in real-life situations. In response to this limitation, we present the Extended D-vlog dataset, encompassing a collection of 1,261 YouTube vlogs. Additionally, the emergence of large language models (LLMs) like GPT3.5, and GPT4 has sparked interest in their potential that LLMs can act like mental health professionals. Yet, the readiness of these LLM models to be used in real-life settings is still a concern as they can give wrong responses that can harm the users. We introduce a virtual agent serving as an initial contact for mental health patients, offering Cognitive Behavioral Therapy (CBT)-based responses. It comprises two core functions: 1. Identifying depression in individuals, and 2. Delivering CBT-based therapeutic responses. Our Mistral model achieved impressive scores of 70.1% and 30.9% for distortion assessment and classification, along with a Bert score of 88.7%. Moreover, utilizing the TVLT model on our Multimodal Extended D-vlog Dataset yielded outstanding results, with an impressive F1-score of 67.8%

Standardizing Genomic Reports: A Dataset, A Standardized Format, and A Prompt-Based Technique for Structured Data Extraction
Tamali Banerjee | Akshit Varmora | Jay J. Gorakhiya | Sanand Sasidharan | Anuradha Kanamarlapudi | Pushpak Bhattacharyya
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

Extracting information from genomic reports of cancer patients is crucial for both healthcare professionals and cancer research. While Large Language Models (LLMs) have shown promise in extracting information, their potential for handling genomic reports remains unexplored. These reports are complex, multi-page documents that feature a variety of visually rich, structured layouts and contain many domain-specific terms. Two primary challenges complicate the process: (i) extracting data from PDFs with intricate layouts and domain-specific terminology and (ii) dealing with variations in report layouts from different laboratories, making extraction layout-dependent and posing challenges for subsequent data processing. To tackle these issues, we propose GR-PROMPT, a prompt-based technique, and GR-FORMAT, a standardized format. Together, these two convert a genomic report in PDF format into GR-FORMAT as a JSON file using a multimodal LLM. To address the lack of available datasets for this task, we introduce GR-DATASET, a synthetic collection of 100 cancer genomic reports in PDF format. Each report is accompanied by key-value information presented in a layout-specific format, as well as structured key-value information in GR-FORMAT. This is the first dataset in this domain to promote further research for the task. We performed our experiment on this dataset.

IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context
Nihar Sahoo | Pranamya Kulkarni | Arif Ahmad | Tanu Goyal | Narjis Asad | Aparna Garimella | Pushpak Bhattacharyya
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

The pervasive influence of social biases in language data has sparked the need for benchmark datasets that capture and evaluate these biases in Large Language Models (LLMs). Existing efforts predominantly focus on English language and the Western context, leaving a void for a reliable dataset that encapsulates India’s unique socio-cultural nuances. To bridge this gap, we introduce IndiBias, a comprehensive benchmarking dataset designed specifically for evaluating social biases in the Indian context. We filter and translate the existing CrowS-Pairs dataset to create a benchmark dataset suited to the Indian context in Hindi language. Additionally, we leverage LLMs including ChatGPT and InstructGPT to augment our dataset with diverse societal biases and stereotypes prevalent in India. The included bias dimensions encompass gender, religion, caste, age, region, physical appearance, and occupation. We also build a resource to address intersectional biases along three intersectional dimensions. Our dataset contains 800 sentence pairs and 300 tuples for bias measurement across different demographics. The dataset is available in English and Hindi, providing a size comparable to existing benchmark datasets. Furthermore, using IndiBias we compare ten different language models on multiple bias measurement metrics. We observed that the language models exhibit more bias across a majority of the intersectional groups. All the scripts utilized and datasets created in this study are publicly available.

Part-of-speech Tagging for Extremely Low-resource Indian Languages
Sanjeev Kumar | Preethi Jyothi | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: ACL 2024

Modern natural language processing (NLP) systems thrive when given access to large datasets. However, a large fraction of the world’s languages are not privy to such benefits due to sparse documentation and inadequate digital representation. This is especially true for Indian regional languages. As a first step towards expanding the reach of NLP technologies to extremely low-resource Indian languages, we present a new parallel part-of-speech (POS) evaluation dataset for Angika, Magahi, Bhojpuri and Hindi. Angika, Magahi, Bhojpuri, along with the more well-known Hindi, are all languages spoken in the Indian states of Bihar, Jharkhand and West Bengal. Ours is notably the first NLP resource, even for a shallow NLP task like POS-tagging, for Angika. We establish POS-tagging baselines using state-of-the-art multilingual pretrained language models (PLMs) finetuned on Hindi data, and show zero-shot evaluations on the other three languages. While all four languages use the same Devanagari script, pretrained tokenizers underperform in zero-shot on the three languages. We propose a simple look-back fix to address the tokenization challenge yielding F1-score improvements of up to 8% on Angika and show how it comes very close to an oracle setting when the underlying Hindi word is known (and can be accurately tokenized).

A Morphology-Based Investigation of Positional Encodings
Poulami Ghosh | Shikhar Vashishth | Raj Dabre | Pushpak Bhattacharyya
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Contemporary deep learning models effectively handle languages with diverse morphology despite not being directly integrated into them. Morphology and word order are closely linked, with the latter incorporated into transformer-based models through positional encodings. This prompts a fundamental inquiry: Is there a correlation between the morphological complexity of a language and the utilization of positional encoding in pre-trained language models? In pursuit of an answer, we present the first study addressing this question, encompassing 22 languages and 5 downstream tasks. Our findings reveal that the importance of positional encoding diminishes with increasing morphological complexity in languages. Our study motivates the need for a deeper understanding of positional encoding, augmenting them to better reflect the different languages under consideration.

Pretraining Language Models Using Translationese
Meet Doshi | Raj Dabre | Pushpak Bhattacharyya
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

In this paper, we explore the utility of Translationese as synthetic data created using machine translation for pre-training language models (LMs) for low-resource languages (LRLs). Our simple methodology consists of translating large amounts of web-crawled monolingual documents (clean) into the LRLs, followed by filtering the translated documents using tiny LMs trained on small but clean LRL data. Taking the case of Indian languages, we pre-train LMs from scratch with 28M and 85M parameters, and then fine-tune them for 5 downstream natural language understanding (NLU) and 4 generative (NLG) tasks. We observe that pre-training on filtered synthetic data leads to relative performance drops of only 0.87% for NLU and 2.35% for NLG, compared to pre-training on clean data, and this gap further diminishes upon the inclusion of a small amount of clean data. We also study the impact of synthetic data filtering and the choice of source language for synthetic data generation. Furthermore, evaluating continually pre-trained larger models like Gemma-2B and Llama-3-8B in few-shot settings, we observe that using synthetic data is competitive with using clean data. Our findings suggest that synthetic data shows promise for bridging the pre-training gap between English and LRLs.

STORiCo: Storytelling TTS for Hindi with Character Voice Modulation
Pavan Tankala | Preethi Jyothi | Preeti Rao | Pushpak Bhattacharyya
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

We present a new Hindi text-to-speech (TTS) dataset and demonstrate its utility for the expressive synthesis of children’s audio stories. The dataset comprises narration by a single female speaker who modifies her voice to produce different story characters. Annotation for dialogue identification, character labelling, and character attribution are provided, all of which are expected to facilitate the learning of character voice and speaking styles. Experiments are conducted using different versions of the annotated dataset that enable training a multi-speaker TTS model on the single-speaker data. Subjective tests show that the multi-speaker model improves expressiveness and character voice consistency compared to the baseline single-speaker TTS. With the multi-speaker model, objective evaluations show comparable word error rates, better speaker voice consistency, and higher correlations with ground-truth emotion attributes. We release a new 16.8 hours storytelling speech dataset in Hindi and propose effective solutions for expressive TTS with narrator voice modulation and character voice consistency.

Seeing Is Believing! towards Knowledge-Infused Multi-modal Medical Dialogue Generation
Abhisek Tiwari | Shreyangshu Bera | Preeti Verma | Jaithra Varma Manthena | Sriparna Saha | Pushpak Bhattacharyya | Minakshi Dhar | Sarbajeet Tiwari
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Over the last few years, artificial intelligence-based clinical assistance has gained immense popularity and demand in telemedicine, including automatic disease diagnosis. Patients often describe their signs and symptoms to doctors using visual aids, which provide vital evidence for identifying a medical condition. In addition to learning from our experiences, we learn from well-established theories/ knowledge. With the motivation of leveraging visual cues and medical knowledge, we propose a transformer-based, knowledge-infused multi-modal medical dialogue generation (KI-MMDG) framework. In addition, we present a discourse-aware image identifier (DII) that recognizes signs and their severity by leveraging the current conversation context in addition to the image of the signs. We first curate an empathy and severity-aware multi-modal medical dialogue (ES-MMD) corpus in English, which is annotated with intent, symptoms, and visual signs with severity information. Experimental results show the superior performance of the proposed KI-MMDG model over uni-modal and non-knowledge infused generative models, demonstrating the importance of visual signs and knowledge infusion in symptom investigation and diagnosis. We also observed that the DII model surpasses the existing state-of-the-art model by 7.84%, indicating the crucial significance of dialogue context for identifying a sign image surfaced during conversations. The code and dataset are available at https://github.com/NLP-RL/KI-MMDG.

PUB: A Pragmatics Understanding Benchmark for Assessing LLMs’ Pragmatics Capabilities
Settaluri Sravanthi | Meet Doshi | Pavan Tankala | Rudra Murthy | Raj Dabre | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: ACL 2024

LLMs have demonstrated remarkable capability for understanding semantics, but their understanding of pragmatics is not well studied. To this end, we release a Pragmatics Understanding Benchmark (PUB) dataset consisting of fourteen tasks in four pragmatics phenomena, namely; Implicature, Presupposition, Reference, and Deixis. We curate high-quality test sets for each task, consisting of Multiple Choice Question Answers (MCQA). PUB includes a total of 28k data points, 6.1k are newly annotated. We evaluate nine models varying in the number of parameters and type of training. Our study reveals several key observations about the pragmatic capabilities of LLMs: 1. chat-fine-tuning strongly benefits smaller models, 2. large base models are competitive with their chat-fine-tuned counterparts, 3. there is a huge variance in performance across different pragmatics phenomena, and 4. a noticeable performance gap between human capabilities and model capabilities. We hope that PUB will enable comprehensive evaluation of LLM’s pragmatic reasoning capabilities.

Reconsidering SMT Over NMT for Closely Related Languages: A Case Study of Persian-Hindi Pair
Waisullah Yousofi | Pushpak Bhattacharyya
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

This paper demonstrates that Phrase-Based Statistical Machine Translation (PBSMT) can outperform Transformer-based Neural Machine Translation (NMT) in moderate-resource scenarios, specifically for structurally similar languages, Persian-Hindi pair in our case. Despite the Transformer architecture’s typical preference for large parallel corpora, our results show that PBSMT achieves a BLEU score of 66.32, significantly exceeding the Transformer-NMT score of 53.7 ingesting the same dataset.

2023

KGVL-BART: Knowledge Graph Augmented Visual Language BART for Radiology Report Generation
Kaveri Kale | Pushpak Bhattacharyya | Milind Gune | Aditya Shetty | Rustom Lawyer
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Timely generation of radiology reports and diagnoses is a challenge worldwide due to the enormous number of cases and shortage of radiology specialists. In this paper, we propose a Knowledge Graph Augmented Vision Language BART (KGVL-BART) model that takes as input two chest X-ray images- one frontal and the other lateral- along with tags which are diagnostic keywords, and outputs a report with the patient-specific findings. Our system development effort is divided into 3 stages: i) construction of the Chest X-ray KG (referred to as chestX-KG), ii) image feature extraction, and iii) training a KGVL-BART model using the visual, text, and KG data. The dataset we use is the well-known Indiana University Chest X-ray reports with the train, validation, and test split of 3025 instances, 300 instances, and 500 instances respectively. We construct a Chest X-Ray knowledge graph from these reports by extracting entity1-relation-entity2 triples; the triples get extracted by a rule-based tool of our own. Constructed KG is verified by two experienced radiologists (with experience of 30 years and 8 years, respectively). We demonstrate that our model- KGVL-BART- outperforms State-of-the-Art transformer-based models on standard NLG scoring metrics. We also include a qualitative evaluation of our system by experienced radiologist (with experience of 30 years) on the test data, which showed that 73% of the reports generated were fully correct, only 5.5% are completely wrong and 21.5% have important missing details though overall correct. To the best of our knowledge, ours is the first system to make use of multi-modality and domain knowledge to generate X-ray reports automatically.

Kurosawa: A Script Writer’s Assistant
Prerak Gandhi | Vishal Pramanik | Pushpak Bhattacharyya
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

Storytelling is the lifeline of the entertainment industrymovies, TV shows, and stand-up comedies, all need stories. A good and gripping script is the lifeline of storytelling and demands creativity and resource investment. Good scriptwriters are rare to find and often work under severe time pressure. Consequently, entertainment media are actively looking for automation. In this paper, we present an AIbased script-writing workbench called KUROSAWA which addresses the tasks of plot generation and script generation. Plot generation aims to generate a coherent and creative plot (600–800 words) given a prompt (15–40 words). Script generation, on the other hand, generates a scene (200–500 words) in a screenplay format from a brief description (15–40 words). Kurosawa needs data to train. We use a 4-act structure of storytelling to annotate the plot dataset manually. We create a dataset of 1000 manually annotated plots and their corresponding prompts/storylines and a gold-standard dataset of 1000 scenes with four main elements — scene headings, action lines, dialogues, and character names — tagged individually. We fine-tune GPT-3 with the above datasets to generate plots and scenes. These plots and scenes are first evaluated and then used by the scriptwriters of a large and famous media platform ErosNow. We release the annotated datasets and the models trained on these datasets as a working benchmark for automatic movie plot and script generation.

“A Little is Enough”: Few-Shot Quality Estimation based Corpus Filtering improves Machine Translation
Akshay Batheja | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: ACL 2023

Quality Estimation (QE) is the task of evaluating the quality of a translation when reference translation is not available. The goal of QE aligns with the task of corpus filtering, where we assign the quality score to the sentence pairs present in the pseudo-parallel corpus. We propose a Quality Estimation based Filtering approach to extract high-quality parallel data from the pseudo-parallel corpus. To the best of our knowledge, this is a novel adaptation of QE framework to extracting quality parallel corpus from the pseudo-parallel corpus.. By training with this filtered corpus, we observe an improvement in the Machine Translation (MT) system’s performance by up to 1.8 BLEU points, for English-Marathi, Chinese-English, and Hindi-Bengali language pairs, over the baseline model. The baseline model is the one that is trained on the whole pseudo-parallel corpus. Our Few-shot QE model transfer learned from the English-Marathi QE model and fine-tuned on only 500 Hindi-Bengali training instances, shows an improvement of up to 0.6 BLEU points for Hindi-Bengali language pair, compared to the baseline model. This demonstrates the promise of transfer learning in the setting under discussion. QE systems typically require in the order of (7K-25K) of training data. Our Hindi-Bengali QE is trained on only 500 instances of training that is 1/40th of the normal requirement and achieves comparable performance. All the scripts and datasets utilized in this study will be publicly available.

Replace and Report: NLP Assisted Radiology Report Generation
Kaveri Kale | Pushpak Bhattacharyya | Kshitij Jadhav
Findings of the Association for Computational Linguistics: ACL 2023

Clinical practice frequently uses medical imaging for diagnosis and treatment. A significant challenge for automatic radiology report generation is that the radiology reports are long narratives consisting of multiple sentences for both abnormal and normal findings. Therefore, applying conventional image captioning approaches to generate the whole report proves to be insufficient, as these are designed to briefly describe images with short sentences. We propose a template-based approach to generate radiology reports from radiographs. Our approach involves the following: i) using a multilabel image classifier, produce the tags for the input radiograph; ii) using a transformer-based model, generate pathological descriptions (a description of abnormal findings seen on radiographs) from the tags generated in step (i); iii) using a BERT-based multi-label text classifier, find the spans in the normal report template to replace with the generated pathological descriptions; and iv) using a rule-based system, replace the identified span with the generated pathological description. We performed experiments with the two most popular radiology report datasets, IU Chest X-ray and MIMIC-CXR and demonstrated that the BLEU-1, ROUGE-L, METEOR, and CIDEr scores are better than the State-of-the-Art models by 25%, 36%, 44% and 48% respectively, on the IU X-RAY dataset. To the best of our knowledge, this is the first attempt to generate chest X-ray radiology reports by first creating small sentences for abnormal findings and then replacing them in the normal report template.

NLI to the Rescue: Mapping Entailment Classes to Hallucination Categories in Abstractive Summarization
Naveen Badathala | Ashita Saxena | Pushpak Bhattacharyya
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

In this paper, we detect hallucinations in summaries generated by abstractive summarization models. We focus on three types of hallucination viz. intrinsic, extrinsic, and nonhallucinated. The method used for detecting hallucination is based on textual entailment. Given a premise and a hypothesis, textual entailment classifies the hypothesis as contradiction, neutral, or entailment. These three classes of textual entailment are mapped to intrinsic, extrinsic, and non-hallucinated respectively. We fine-tune a RoBERTa-large model on NLI datasets and use it to detect hallucinations on the XSumFaith dataset. We demonstrate that our simple approach using textual entailment outperforms the existing factuality inconsistency detection systems by 12% and we provide insightful analysis of all types of hallucination. To advance research in this area, we create and release a dataset, XSumFaith++, which contains balanced instances of hallucinated and non-hallucinated summaries.

Evaluating Cross Lingual Transfer for Morphological Analysis: a Case Study of Indian Languages
Siddhesh Pawar | Pushpak Bhattacharyya | Partha Talukdar
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology

Recent advances in pretrained multilingual models such as Multilingual T5 (mT5) have facilitated cross-lingual transfer by learning shared representations across languages. Leveraging pretrained multilingual models for scaling morphology analyzers to low-resource languages is a unique opportunity that has been under-explored so far. We investigate this line of research in the context of Indian languages, focusing on two important morphological sub-tasks: root word extraction and tagging morphosyntactic descriptions (MSD), viz., gender, number, and person (GNP). We experiment with six Indian languages from two language families (Dravidian and Indo-Aryan) to train a multilingual morphology analyzers for the first time for Indian languages. We demonstrate the usability of multilingual models for few-shot cross-lingual transfer through an average 7% increase in GNP tagging in a cross-lingual setting as compared to a monolingual setting through controlled experiments. We provide an overview of the state of the datasets available related to our tasks and point-out a few modeling limitations due to datasets. Lastly, we analyze the cross-lingual transfer of morphological tags for verbs and nouns, which provides a proxy for the quality of representations of word markings learned by the model.

Synthesize, if you do not have: Effective Synthetic Dataset Creation Strategies for Self-Supervised Opinion Summarization in E-commerce
Tejpalsingh Siledar | Suman Banerjee | Amey Patil | Sudhanshu Singh | Muthusamy Chelliah | Nikesh Garera | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: EMNLP 2023

In e-commerce, opinion summarization is the process of condensing the opinions presented in product reviews. However, the absence of large amounts of supervised datasets presents challenges in generating both aspect-specific and general opinion summaries. Existing approaches have attempted to address these challenges through synthetic dataset creation (SDC). However, general opinion summarization models struggle to generate summaries faithful to the input reviews whereas aspect-specific opinion summarization models are limited due to their reliance on human-specified aspects and seed words. To address this, we propose SDC strategies tailored for general and aspect-specific opinion summarization. We experimented on three e-commerce test sets: Oposum+, Amazon, and Flipkart. For general opinion summarization, pre-trained language model (PLM) fine-tuned on our general synthetic dataset surpass the SOTA on average by 2.3 R1 points. Faithfulness evaluation metrics and human evaluations indicate that our model-generated summaries are more faithful to the input compared to others. For aspect-specific opinion summarization, PLM fine-tuned on our aspect-specific synthetic dataset surpass SOTA by ~ 1 R1 point without the aid of any human-specified aspects or seed words.

GenEx: A Commonsense-aware Unified Generative Framework for Explainable Cyberbullying Detection
Krishanu Maity | Raghav Jain | Prince Jha | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

With the rise of social media and online communication, the issue of cyberbullying has gained significant prominence. While extensive research is being conducted to develop more effective models for detecting cyberbullying in monolingual languages, a significant gap exists in understanding code-mixed languages and the need for explainability in this context. To address this gap, we have introduced a novel benchmark dataset named BullyExplain for explainable cyberbullying detection in code-mixed language. In this dataset, each post is meticulously annotated with four labels: bully, sentiment, target, and rationales, indicating the specific phrases responsible for identifying the post as a bully. Our current research presents an innovative unified generative framework, GenEx, which reimagines the multitask problem as a text-to-text generation task. Our proposed approach demonstrates its superiority across various evaluation metrics when applied to the BullyExplain dataset, surpassing other baseline models and current state-of-the-art approaches.

DISCO: A Large Scale Human Annotated Corpus for Disfluency Correction in Indo-European Languages
Vineet Bhat | Preethi Jyothi | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: EMNLP 2023

Disfluency correction (DC) is the process of removing disfluent elements like fillers, repetitions and corrections from spoken utterances to create readable and interpretable text. DC is a vital post-processing step applied to Automatic Speech Recognition (ASR) outputs, before subsequent processing by downstream language understanding tasks. Existing DC research has primarily focused on English due to the unavailability of large-scale open-source datasets. Towards the goal of multilingual disfluency correction, we present a high-quality human-annotated DC corpus covering four important Indo-European languages: English, Hindi, German and French. We provide extensive analysis of results of state-of-the-art DC models across all four languages obtaining F1 scores of 97.55 (English), 94.29 (Hindi), 95.89 (German) and 92.97 (French). To demonstrate the benefits of DC on downstream tasks, we show that DC leads to 5.65 points increase in BLEU scores on average when used in conjunction with a state-of-the-art Machine Translation (MT) system. We release code to run our experiments along with our annotated dataset here.

“Knowledge is Power”: Constructing Knowledge Graph of Abdominal Organs and Using Them for Automatic Radiology Report Generation
Kaveri Kale | Pushpak Bhattacharyya | Aditya Shetty | Milind Gune | Kush Shrivastava | Rustom Lawyer | Spriha Biswas
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)

In conventional radiology practice, the radiologist dictates the diagnosis to the transcriptionist, who then prepares a preliminary formatted report referring to the notes, after which the radiologist reviews the report, corrects the errors, and signs off. This workflow is prone to delay and error. In this paper, we report our work on automatic radiology report generation from radiologists’ dictation, which is in collaboration with a startup about to become Unicorn. A major contribution of our work is the set of knowledge graphs (KGs) of ten abdominal organs- Liver, Kidney, Gallbladder, Uterus, Urinary bladder, Ovary, Pancreas, Prostate, Biliary Tree, and Bowel. Our method for constructing these KGs relies on extracting entity1-relation-entity2 triplets from a large collection (about 10,000) of free-text radiology reports. The quality and coverage of the KGs are verified by two experienced radiologists (practicing for the last 30 years and 8 years, respectively). The dictation of the radiologist is automatically converted to what is called a pathological description which is the clinical description of the findings of the radiologist during ultrasonography (USG). Our knowledge-enhanced deep learning model improves the reported BLEU-3, ROUGE-L, METEOR, and CIDEr scores of the pathological description generation by 2%, 4%, 2% and 2% respectively. To the best of our knowledge, this is the first attempt at representing the abdominal organs in the form of knowledge graphs and utilising these graphs for the automatic generation of USG reports. A Minimum Viable Product (MVP) has been made available to the beta users, i.e., radiologists of reputed hospitals, for testing and evaluation. Our solution guarantees report generation within 30 seconds of running a scan.

A Match Made in Heaven: A Multi-task Framework for Hyperbole and Metaphor Detection
Naveen Badathala | Abisek Rajakumar Kalarani | Tejpalsingh Siledar | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: ACL 2023

Hyperbole and metaphor are common in day-to-day communication (e.g., “I am in deep trouble”: how does trouble have depth?), which makes their detection important, especially in a conversational AI setting. Existing approaches to automatically detect metaphor and hyperbole have studied these language phenomena independently, but their relationship has hardly, if ever, been explored computationally. In this paper, we propose a multi-task deep learning framework to detect hyperbole and metaphor simultaneously. We hypothesize that metaphors help in hyperbole detection, and vice-versa. To test this hypothesis, we annotate two hyperbole datasets- HYPO and HYPO-L- with metaphor labels. Simultaneously, we annotate two metaphor datasets- TroFi and LCC- with hyperbole labels. Experiments using these datasets give an improvement of the state of the art of hyperbole detection by 12%. Additionally, our multi-task learning (MTL) approach shows an improvement of up to 17% over single-task learning (STL) for both hyperbole and metaphor detection, supporting our hypothesis. To the best of our knowledge, ours is the first demonstration of computational leveraging of linguistic intimacy between metaphor and hyperbole, leading to showing the superiority of MTL over STL for hyperbole and metaphor detection.

Predict and Use: Harnessing Predicted Gaze to Improve Multimodal Sarcasm Detection
Divyank Tiwari | Diptesh Kanojia | Anupama Ray | Apoorva Nunna | Pushpak Bhattacharyya
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Sarcasm is a complex linguistic construct with incongruity at its very core. Detecting sarcasm depends on the actual content spoken and tonality, facial expressions, the context of an utterance, and personal traits like language proficiency and cognitive capabilities. In this paper, we propose the utilization of synthetic gaze data to improve the task performance for multimodal sarcasm detection in a conversational setting. We enrich an existing multimodal conversational dataset, i.e., MUStARD++ with gaze features. With the help of human participants, we collect gaze features for 20% of data instances, and we investigate various methods for gaze feature prediction for the rest of the dataset. We perform extrinsic and intrinsic evaluations to assess the quality of the predicted gaze features. We observe a performance gain of up to 6.6% points by adding a new modality, i.e., collected gaze features. When both collected and predicted data are used, we observe a performance gain of 2.3% points on the complete dataset. Interestingly, with only predicted gaze features, too, we observe a gain in performance (1.9% points). We retain and use the feature prediction model, which maximally correlates with collected gaze features. Our model trained on combining collected and synthetic gaze data achieves SoTA performance on the MUStARD++ dataset. To the best of our knowledge, ours is the first predict-and-use model for sarcasm detection. We publicly release the code, gaze data, and our best models for further research.

Retrofitting Light-weight Language Models for Emotions using Supervised Contrastive Learning
Sapan Shah | Sreedhar Reddy | Pushpak Bhattacharyya
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

We present a novel retrofitting method to induce emotion aspects into pre-trained language models (PLMs) such as BERT and RoBERTa. Our method updates pre-trained network weights using contrastive learning so that the text fragments exhibiting similar emotions are encoded nearby in the representation space, and the fragments with different emotion content are pushed apart. While doing so, it also ensures that the linguistic knowledge already present in PLMs is not inadvertently perturbed. The language models retrofitted by our method, i.e., BERTEmo and RoBERTaEmo, produce emotion-aware text representations, as evaluated through different clustering and retrieval metrics. For the downstream tasks on sentiment analysis and sarcasm detection, they perform better than their pre-trained counterparts (about 1% improvement in F1-score) and other existing approaches. Additionally, a more significant boost in performance is observed for the retrofitted models over pre-trained ones in few-shot learning setting.

Angel: Enterprise Search System for the Non-Profit Industry
Saiful Haq | Ashutosh Sharma | Pushpak Bhattacharyya
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

Non-profit industry need a system for accurately matching fund-seekers (e.g., AMERICAN NATIONAL RED CROSS) with fund-givers (e.g., BILL AND MELINDA GATES FOUNDATION) aligned in cause (e.g., cancer) and target beneficiary group (e.g., children). In this paper, we create an enterprise search system “ANGEL” for the non-profit industry that takes a fund-giver’s mission description as input and returns a ranked list of fund-seekers as output, and vice-versa. ANGEL employs ColBERT, a neural information retrieval model, which we enhance by exploiting the two techniques of (a) Syntax-aware local attention (SLA) to combine syntactic information in the mission description with multi-head self-attention and (b) Dense Pseudo Relevance Feedback (DPRF) for augmentation of short mission descriptions. We create a mapping dictionary “non-profit-dict” to curate a “non-profit-search database” containing information on 594K fund-givers and 194K fund-seekers from IRS-990 filings for the non-profit industry search engines . We also curate a “non-profit-evaluation” dataset containing scored matching between 463 fund-givers and 100 fund-seekers. The research is in collaboration with a philanthropic startup that identifies itself as an “AI matching platform, fundraising assistant, and philanthropy search base.” Domain experts at the philanthropic startup annotate the non-profit evaluation dataset and continuously evaluate the performance of ANGEL. ANGEL achieves an improvement of 0.14 MAP@10 and 0.16 MRR@10 over the state-of-the-art baseline on the non-profit evaluation dataset. To the best of our knowledge, ours is the first effort at building an enterprise search engine based on neural information retrieval for the non-profit industry.

A Multi-task Learning Framework for Quality Estimation
Sourabh Deoghare | Paramveer Choudhary | Diptesh Kanojia | Tharindu Ranasinghe | Pushpak Bhattacharyya | Constantin Orăsan
Findings of the Association for Computational Linguistics: ACL 2023

Quality Estimation (QE) is the task of evaluating machine translation output in the absence of reference translation. Conventional approaches to QE involve training separate models at different levels of granularity viz., word-level, sentence-level, and document-level, which sometimes lead to inconsistent predictions for the same input. To overcome this limitation, we focus on jointly training a single model for sentence-level and word-level QE tasks in a multi-task learning framework. Using two multi-task learning-based QE approaches, we show that multi-task learning improves the performance of both tasks. We evaluate these approaches by performing experiments in different settings, viz., single-pair, multi-pair, and zero-shot. We compare the multi-task learning-based approach with baseline QE models trained on single tasks and observe an improvement of up to 4.28% in Pearson’s correlation (r) at sentence-level and 8.46% in F1-score at word-level, in the single-pair setting. In the multi-pair setting, we observe improvements of up to 3.04% at sentence-level and 13.74% at word-level; while in the zero-shot setting, we also observe improvements of up to 5.26% and 3.05%, respectively. We make the models proposed in this paper publically available.

KITLM: Domain-Specific Knowledge InTegration into Language Models for Question Answering
Ankush Agarwal | Sakharam Gawade | Amar Prakash Azad | Pushpak Bhattacharyya
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

Large language models (LLMs) have demon- strated remarkable performance in a wide range of natural language tasks. However, as these models continue to grow in size, they face sig- nificant challenges in terms of computational costs. Additionally, LLMs often lack efficient domain-specific understanding, which is par- ticularly crucial in specialized fields such as aviation and healthcare. To boost the domain- specific understanding, we propose, KITLM 1 , a novel knowledge base integration approach into language model through relevant informa- tion infusion. By integrating pertinent knowl- edge, not only the performance of the lan- guage model is greatly enhanced, but the model size requirement is also significantly reduced while achieving comparable performance. Our proposed knowledge-infused model surpasses the performance of both GPT-3.5-turbo and the state-of-the-art knowledge infusion method, SKILL, achieving over 1.5 times improvement in exact match scores on the MetaQA. KITLM showed a similar performance boost in the avi- ation domain with AeroQA. The drastic perfor- mance improvement of KITLM over the exist- ing methods can be attributed to the infusion of relevant knowledge while mitigating noise. In addition, we release two curated datasets to accelerate knowledge infusion research in specialized fields: a) AeroQA, a new bench- mark dataset designed for multi-hop question- answering within the aviation domain, and b) Aviation Corpus, a dataset constructed from unstructured text extracted from the National Transportation Safety Board reports. Our re- search contributes to advancing the field of domain-specific language understanding and showcases the potential of knowledge infusion techniques in improving the performance.

IndIE: A Multilingual Open Information Extraction Tool For Indic Languages
Ritwik Mishra | Simranjeet Singh | Rajiv Ratn Shah | Ponnurangam Kumaraguru | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023 (Findings)

“Let’s not Quote out of Context”: Unified Vision-Language Pretraining for Context Assisted Image Captioning
Abisek Rajakumar Kalarani | Pushpak Bhattacharyya | Niyati Chhaya | Sumit Shekhar
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)

Well-formed context aware image captions and tags in enterprise content such as marketing material are critical to ensure their brand presence and content recall. Manual creation and updates to ensure the same is non trivial given the scale and the tedium towards this task. We propose a new unified Vision-Language (VL) model based on the One For All (OFA) model, with a focus on context-assisted image captioning where the caption is generated based on both the image and its context. Our approach aims to overcome the context-independent (image and text are treated independently) nature of the existing approaches. We exploit context by pretraining our model with datasets of three tasks- news image captioning where the news article is the context, contextual visual entailment, and keyword extraction from the context. The second pretraining task is a new VL task, and we construct and release two datasets for the task with 1.1M and 2.2K data instances. Our system achieves state-of-the-art results with an improvement of up to 8.34 CIDEr score on the benchmark news image captioning datasets. To the best of our knowledge, ours is the first effort at incorporating contextual information in pretraining the models for the VL tasks.

Machine Translation Advancements for Low-Resource Indian Languages in WMT23: CFILT-IITB’s Effort for Bridging the Gap
Pranav Gaikwad | Meet Doshi | Sourabh Deoghare | Pushpak Bhattacharyya
Proceedings of the Eighth Conference on Machine Translation

This paper is related to the submission of the CFILT-IITB team for the task called IndicMT in WMT23. The paper describes our MT systems submitted to the WMT23 IndicMT shared task. The task focused on MT system development from/to English and four low-resource North-East Indian languages, viz., Assamese, Khasi, Manipuri, and Mizo. We trained them on a small parallel corpus resulting in poor-quality systems. Therefore, we utilize transfer learning with the help of a large pre-trained multilingual NMT system. Since this approach produced the best results, we submitted our NMT models for the shared task using this approach.

Adversarial Training for Low-Resource Disfluency Correction
Vineet Bhat | Preethi Jyothi | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: ACL 2023

Disfluencies commonly occur in conversational speech. Speech with disfluencies can result in noisy Automatic Speech Recognition (ASR) transcripts, which affects downstream tasks like machine translation. In this paper, we propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC) that utilizes a small amount of labeled real disfluent data in conjunction with a large amount of unlabeled data. We show the benefit of our proposed technique, which crucially depends on synthetically generated disfluent data, by evaluating it for DC in three Indian languages- Bengali, Hindi, and Marathi (all from the Indo-Aryan family). Our technique also performs well in removing stuttering disfluencies in ASR transcripts introduced by speech impairments. We achieve an average 6.15 points improvement in F1-score over competitive baselines across all three languages mentioned. To the best of our knowledge, we are the first to utilize adversarial training for DC and use it to correct stuttering disfluencies in English, establishing a new benchmark for this task.

RPTCS: A Reinforced Persona-aware Topic-guiding Conversational System
Zishan Ahmad | Kshitij Mishra | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Although there has been a plethora of work on open-domain conversational systems, most of the systems lack the mechanism of controlling the concept transitions in a dialogue. For activities like switching from casual chit-chat to task-oriented conversation, an agent with the ability to manage the flow of concepts in a conversation might be helpful. The user would find the dialogue more engaging and be more receptive to such transitions if these concept transitions were made while taking into account the user’s persona. Focusing on persona-aware concept transitions, we propose a Reinforced Persona-aware Topic-guiding Conversational System (RPTCS). Due to the lack of a persona-aware topic transition dataset, we propose a novel conversation dataset creation mechanism in which the conversational agent leads the discourse to drift to a set of target concepts depending on the persona of the speaker and the context of the conversation. To avoid scarcely available expensive human resource, the entire data-creation process is mostly automatic with human-in-loop only for quality checks. This created conversational dataset named PTCD is used to develop the RPTCS in two steps. First, a maximum likelihood estimation loss-based conversational model is trained on PTCD. Then this trained model is fine-tuned in a Reinforcement Learning (RL) framework by employing novel reward functions to assure persona, topic, and context consistency with non-repetitiveness in generated responses. Our experimental results demonstrate the strength of the proposed system with respect to strong baselines.

Findings of the WMT 2023 Shared Task on Automatic Post-Editing
Pushpak Bhattacharyya | Rajen Chatterjee | Markus Freitag | Diptesh Kanojia | Matteo Negri | Marco Turchi
Proceedings of the Eighth Conference on Machine Translation

We present the results from the 9th round of the WMT shared task on MT Automatic Post-Editing, which consists of automatically correcting the output of a “black-box” machine translation system by learning from human corrections. Like last year, the task focused on English→Marathi, with data coming from multiple domains (healthcare, tourism, and general/news). Despite the consistent task framework, this year’s data proved to be extremely challenging. As a matter of fact, none of the official submissions from the participating teams succeeded in improving the quality of the already high-level initial translations (with baseline TER and BLEU scores of 26.6 and 70.66, respectively). Only one run, accepted as a “late” submission, achieved automatic evaluation scores that exceeded the baseline.

Eyes Show the Way: Modelling Gaze Behaviour for Hallucination Detection
Kishan Maharaj | Ashita Saxena | Raja Kumar | Abhijit Mishra | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: EMNLP 2023

Detecting hallucinations in natural language processing (NLP) is a critical undertaking that demands a deep understanding of both the semantic and pragmatic aspects of languages. Cognitive approaches that leverage users’ behavioural signals, such as gaze, have demonstrated effectiveness in addressing NLP tasks with similar linguistic complexities. However, their potential in the context of hallucination detection remains largely unexplored. In this paper, we propose a novel cognitive approach for hallucination detection that leverages gaze signals from humans. We first collect and introduce an eye tracking corpus (IITB-HGC: IITB-Hallucination Gaze corpus) consisting of 500 instances, annotated by five annotators for hallucination detection. Our analysis reveals that humans selectively attend to relevant parts of the text based on distributional similarity, similar to the attention bias phenomenon in psychology. We identify two attention strategies employed by humans: global attention, which focuses on the most informative sentence, and local attention, which focuses on important words within a sentence. Leveraging these insights, we propose a novel cognitive framework for hallucination detection that incorporates these attention biases. Experimental evaluations on the FactCC dataset demonstrate the efficacy of our approach, obtaining a balanced accuracy of 87.1%. Our study highlights the potential of gaze-based approaches in addressing the task of hallucination detection and sheds light on the cognitive processes employed by humans in identifying inconsistencies.

Quality Estimation-Assisted Automatic Post-Editing
Sourabh Deoghare | Diptesh Kanojia | Fred Blain | Tharindu Ranasinghe | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: EMNLP 2023

Automatic Post-Editing (APE) systems are prone to over-correction of the Machine Translation (MT) outputs. While Word-level Quality Estimation (QE) system can provide a way to curtail the over-correction, a significant performance gain has not been observed thus far by utilizing existing APE and QE combination strategies. In this paper, we propose joint training of a model on APE and QE tasks to improve the APE. Our proposed approach utilizes a multi-task learning (MTL) methodology, which shows significant improvement while treating both tasks as a ‘bargaining game’ during training. Moreover, we investigate various existing combination strategies and show that our approach achieves state-of-the-art performance for a ‘distant’ language pair, viz., English-Marathi. We observe an improvement of 1.09 TER and 1.37 BLEU points over a baseline QE-Unassisted APE system for English-Marathi, while also observing 0.46 TER and 0.62 BLEU points for English-German. Further, we discuss the results qualitatively and show how our approach helps reduce over-correction, thereby improving the APE performance. We also observe that the degree of integration between QE and APE directly correlates with the APE performance gain. We release our code and models publicly.

A Study of Multilingual versus Meta-Learning for Language Model Pre-Training for Adaptation to Unseen Low Resource Languages
Jyotsana Khatri | Rudra Murthy | Amar Prakash Azad | Pushpak Bhattacharyya
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track

In this paper, we compare two approaches to train a multilingual language model: (i) simple multilingual learning using data-mixing, and (ii) meta-learning. We examine the performance of these models by extending them to unseen language pairs and further finetune them for the task of unsupervised NMT. We perform several experiments with varying amounts of data and give a comparative analysis of the approaches. We observe that both approaches give a comparable performance, and meta-learning gives slightly better results in a few cases of low amounts of data. For Oriya-Punjabi language pair, meta-learning performs better than multilingual learning when using 2M, and 3M sentences.

With Prejudice to None: A Few-Shot, Multilingual Transfer Learning Approach to Detect Social Bias in Low Resource Languages
Nihar Sahoo | Niteesh Mallela | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: ACL 2023

In this paper, we describe our work on social bias detection in a low-resource multilingual setting in which the languages are from two very divergent families- Indo-European (English, Hindi, and Italian) and Altaic (Korean). Currently, the majority of the social bias datasets available are in English and this inhibits progress on social bias detection in low-resource languages. To address this problem, we introduce a new dataset for social bias detection in Hindi and investigate multilingual transfer learning using publicly available English, Italian, and Korean datasets. The Hindi dataset contains 9k social media posts annotated for (i) binary bias labels (bias/neutral), (ii) binary labels for sentiment (positive/negative), (iii) target groups for each bias category, and (iv) rationale for annotated bias labels (a short piece of text). We benchmark our Hindi dataset using different multilingual models, with XLM-R achieving the best performance of 80.8 macro-F1 score. Our results show that the detection of social biases in resource-constrained languages such as Hindi and Korean may be improved with the use of a similar dataset in English. We also show that translating all datasets into English does not work effectively for detecting social bias, since the nuances of source language are lost in translation. All the scripts and datasets utilized in this study will be publicly available.

Comparing DAE-based and MASS-based UNMT: Robustness to Word-Order Divergence in English–>Indic Language Pairs
Tamali Banerjee | Rudra Murthy | Pushpak Bhattacharyya
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

The proliferation of fake news poses a significant challenge in the digital era. Detecting false information, especially in non-English languages, is crucial to combating misinformation effectively. In this research, we introduce a novel approach for Dravidian fake news detection by harnessing the capabilities of the MuRIL transformer model, further enhanced by gradient accumulation techniques. Our study focuses on the Dravidian languages, a diverse group of languages spoken in South India, which are often underserved in natural language processing research. We optimize memory usage, stabilize training, and improve the model’s overall performance by accumulating gradients over multiple batches. The proposed model exhibits promising results in terms of both accuracy and efficiency. Our findings underline the significance of adapting state-of-the-art techniques, such as MuRIL-based models and gradient accumulation, to non-English language.

Reinforcement Replaces Supervision: Query focused Summarization using Deep Reinforcement Learning
Swaroop Nath | Pushpak Bhattacharyya | Harshad Khadilkar
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Query-focused Summarization (QfS) deals with systems that generate summaries from document(s) based on a query. Motivated by the insight that Reinforcement Learning (RL) provides a generalization to Supervised Learning (SL) for Natural Language Generation, and thereby performs better (empirically) than SL, we use an RL-based approach for this task of QfS. Additionally, we also resolve the conflict of employing RL in Transformers with Teacher Forcing. We develop multiple Policy Gradient networks, trained on various reward signals: ROUGE, BLEU, and Semantic Similarity, which lead to a 10-point improvement over the State-of-the-Art approach on the ROUGE-L metric for a benchmark dataset (ELI5). We also show performance of our approach in zero-shot setting for another benchmark dataset (DebatePedia) – our approach leads to results comparable to baselines, which were specifically trained on DebatePedia. To aid the RL training, we propose a better semantic similarity reward, enabled by a novel Passage Embedding scheme developed using Cluster Hypothesis. Lastly, we contribute a gold-standard test dataset to further research in QfS and Long-form Question Answering (LfQA).

2022

Findings of the WMT 2022 Shared Task on Automatic Post-Editing
Pushpak Bhattacharyya | Rajen Chatterjee | Markus Freitag | Diptesh Kanojia | Matteo Negri | Marco Turchi
Proceedings of the Seventh Conference on Machine Translation (WMT)

We present the results from the 8th round of the WMT shared task on MT Automatic PostEditing, which consists in automatically correcting the output of a “black-box” machine translation system by learning from human corrections. This year, the task focused on a new language pair (English→Marathi) and on data coming from multiple domains (healthcare, tourism, and general/news). Although according to several indicators this round was of medium-high difficulty compared to the past,the best submission from the three participating teams managed to significantly improve (with an error reduction of 3.49 TER points) the original translations produced by a generic neural MT system.

COMMA-DEER: COmmon-sense Aware Multimodal Multitask Approach for Detection of Emotion and Emotional Reasoning in Conversations
Soumitra Ghosh | Gopendra Vikram Singh | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 29th International Conference on Computational Linguistics

Mental health is a critical component of the United Nations’ Sustainable Development Goals (SDGs), particularly Goal 3, which aims to provide “good health and well-being”. The present mental health treatment gap is exacerbated by stigma, lack of human resources, and lack of research capability for implementation and policy reform. We present and discuss a novel task of detecting emotional reasoning (ER) and accompanying emotions in conversations. In particular, we create a first-of-its-kind multimodal mental health conversational corpus that is manually annotated at the utterance level with emotional reasoning and related emotion. We develop a multimodal multitask framework with a novel multimodal feature fusion technique and a contextuality learning module to handle the two tasks. Leveraging multimodal sources of information, commonsense reasoning, and through a multitask framework, our proposed model produces strong results. We achieve performance gains of 6% accuracy and 4.62% F1 on the emotion detection task and 3.56% accuracy and 3.31% F1 on the ER detection task, when compared to the existing state-of-the-art model.

Novelty Detection in Community Question Answering Forums
Tirthankar Ghosal | Vignesh Edithal | Tanik Saikh | Saprativa Bhattacharjee | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

EM-PERSONA: EMotion-assisted Deep Neural Framework for PERSONAlity Subtyping from Suicide Notes
Soumitra Ghosh | Dhirendra Kumar Maurya | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 29th International Conference on Computational Linguistics

The World Health Organization has emphasised the need of stepping up suicide prevention efforts to meet the United Nation’s Sustainable Development Goal target of 2030 (Goal 3: Good health and well-being). We address the challenging task of personality subtyping from suicide notes. Most research on personality subtyping has relied on statistical analysis and feature engineering. Moreover, state-of-the-art transformer models in the automated personality subtyping problem have received relatively less attention. We develop a novel EMotion-assisted PERSONAlity Detection Framework (EM-PERSONA). We annotate the benchmark CEASE-v2.0 suicide notes dataset with personality traits across four dichotomies: Introversion (I)-Extraversion (E), Intuition (N)-Sensing (S), Thinking (T)-Feeling (F), Judging (J)–Perceiving (P). Our proposed method outperforms all baselines on comprehensive evaluation using multiple state-of-the-art systems. Across the four dichotomies, EM-PERSONA improved accuracy by 2.04%, 3.69%, 4.52%, and 3.42%, respectively, over the highest-performing single-task systems.

Emotion Enriched Retrofitted Word Embeddings
Sapan Shah | Sreedhar Reddy | Pushpak Bhattacharyya
Proceedings of the 29th International Conference on Computational Linguistics

Word embeddings learned using the distributional hypothesis (e.g., GloVe, Word2vec) are good at encoding various lexical-semantic relations. However, they do not capture the emotion aspects of words. We present a novel retrofitting method for updating the vectors of emotion bearing words like fun, offence, angry, etc. The retrofitted embeddings achieve better inter-cluster and intra-cluster distance for words having the same emotions, e.g., the joy cluster containing words like fun, happiness, etc., and the anger cluster with words like offence, rage, etc., as evaluated through different cluster quality metrics. For the downstream tasks on sentiment analysis and sarcasm detection, simple classification models, such as SVM and Attention Net, learned using our retrofitted embeddings perform better than their pre-trained counterparts (about 1.5 % improvement in F1-score) as well as other benchmarks. Furthermore, the difference in performance is more pronounced in the limited data setting.

Knowledge Graph - Deep Learning: A Case Study in Question Answering in Aviation Safety Domain
Ankush Agarwal | Raj Gite | Shreya Laddha | Pushpak Bhattacharyya | Satyanarayan Kar | Asif Ekbal | Prabhjit Thind | Rajesh Zele | Ravi Shankar
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In the commercial aviation domain, there are a large number of documents, like accident reports of NTSB and ASRS, and regulatory directives ADs. There is a need for a system to efficiently access these diverse repositories to serve the demands of the aviation industry, such as maintenance, compliance, and safety. In this paper, we propose a Knowledge Graph (KG) guided Deep Learning (DL) based Question Answering (QA) system to cater to these requirements. We construct a KG from aircraft accident reports and contribute this resource to the community of researchers. The efficacy of this resource is tested and proved by the proposed QA system. Questions in Natural Language are converted into SPARQL (the interface language of the RDF graph database) queries and are answered from the KG. On the DL side, we examine two different QA models, BERT-QA and GPT3-QA, covering the two paradigms of answer formulation in QA. We evaluate our system on a set of handcrafted queries curated from the accident reports. Our hybrid KG + DL QA system, KGQA + BERT-QA, achieves 7% and 40.3% increase in accuracy over KGQA and BERT-QA systems respectively. Similarly, the other combined system, KGQA + GPT3-QA, achieves 29.3% and 9.3% increase in accuracy over KGQA and GPT3-QA systems respectively. Thus, we infer that the combination of KG and DL is better than either KG or DL individually for QA, at least in our chosen domain.

Are Emoji, Sentiment, and Emotion Friends? A Multi-task Learning for Emoji, Sentiment, and Emotion Analysis
Gopendra Vikram Singh | Dushyant Singh Chauhan | Mauajama Firdaus | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

Persona or Context? Towards Building Context adaptive Personalized Persuasive Virtual Sales Assistant
Abhisek Tiwari | Sriparna Saha | Shubhashis Sengupta | Anutosh Maitra | Roshni Ramnani | Pushpak Bhattacharyya
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Task-oriented conversational agents are gaining immense popularity and success in a wide range of tasks, from flight ticket booking to online shopping. However, the existing systems presume that end-users will always have a pre-determined and servable task goal, which results in dialogue failure in hostile scenarios, such as goal unavailability. On the other hand, human agents accomplish users’ tasks even in a large number of goal unavailability scenarios by persuading them towards a very similar and servable goal. Motivated by the limitation, we propose and build a novel end-to-end multi-modal persuasive dialogue system incorporated with a personalized persuasive module aided goal controller and goal persuader. The goal controller recognizes goal conflicting/unavailability scenarios and formulates a new goal, while the goal persuader persuades users using a personalized persuasive strategy identified through dialogue context. We also present a novel automatic evaluation metric called Persuasiveness Measurement Rate (PMeR) for quantifying the persuasive capability of a conversational agent. The obtained improvements (both quantitative and qualitative) firmly establish the superiority and need of the proposed context-guided, personalized persuasive virtual agent over existing traditional task-oriented virtual agents. Furthermore, we also curated a multi-modal persuasive conversational dialogue corpus annotated with intent, slot, sentiment, and dialogue act for e-commerce domain.

Hollywood Identity Bias Dataset: A Context Oriented Bias Analysis of Movie Dialogues
Sandhya Singh | Prapti Roy | Nihar Sahoo | Niteesh Mallela | Himanshu Gupta | Pushpak Bhattacharyya | Milind Savagaonkar | Nidhi Sultan | Roshni Ramnani | Anutosh Maitra | Shubhashis Sengupta
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Movies reflect society and also hold power to transform opinions. Social biases and stereotypes present in movies can cause extensive damage due to their reach. These biases are not always found to be the need of storyline but can creep in as the author’s bias. Movie production houses would prefer to ascertain that the bias present in a script is the story’s demand. Today, when deep learning models can give human-level accuracy in multiple tasks, having an AI solution to identify the biases present in the script at the writing stage can help them avoid the inconvenience of stalled release, lawsuits, etc. Since AI solutions are data intensive and there exists no domain specific data to address the problem of biases in scripts, we introduce a new dataset of movie scripts that are annotated for identity bias. The dataset contains dialogue turns annotated for (i) bias labels for seven categories, viz., gender, race/ethnicity, religion, age, occupation, LGBTQ, and other, which contains biases like body shaming, personality bias, etc. (ii) labels for sensitivity, stereotype, sentiment, emotion, emotion intensity, (iii) all labels annotated with context awareness, (iv) target groups and reason for bias labels and (v) expert-driven group-validation process for high quality annotations. We also report various baseline performances for bias identification and category detection on our dataset.

Improving Machine Translation with Phrase Pair Injection and Corpus Filtering
Akshay Batheja | Pushpak Bhattacharyya
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

In this paper, we show that the combination of Phrase Pair Injection and Corpus Filtering boosts the performance of Neural Machine Translation (NMT) systems. We extract parallel phrases and sentences from the pseudo-parallel corpus and augment it with the parallel corpus to train the NMT models. With the proposed approach, we observe an improvement in the Machine Translation (MT) system for 3 low-resource language pairs, Hindi-Marathi, English-Marathi, and English-Pashto, and 6 translation directions by up to 2.7 BLEU points, on the FLORES test data. These BLEU score improvements are over the models trained using the whole pseudo-parallel corpus augmented with the parallel corpus.

There is No Big Brother or Small Brother:Knowledge Infusion in Language Models for Link Prediction and Question Answering
Ankush Agarwal | Sakharam Gawade | Sachin Channabasavarajendra | Pushpak Bhattacharya
Proceedings of the 19th International Conference on Natural Language Processing (ICON)

The integration of knowledge graphs with deep learning is thriving in improving the performance of various natural language processing (NLP) tasks. In this paper, we focus on knowledge-infused link prediction and question answering using language models, T5, and BLOOM across three domains:Aviation, Movie, and Web. In this context, we infuse knowledge in large and small language models and study their performance, and find the performance to be similar. For the link prediction task on the Aviation Knowledge Graph, we obtain a 0.2 hits@1 score using T5-small, T5-base, T5-large, and BLOOM. Using template-based scripts, we create a set of 1 million synthetic factoid QA pairs in the aviation domain from National Transportation Safety Board (NTSB) reports. On our curated QA pairs, the three models of T5 achieve a 0.7 hits@1 score. We validate our findings with the paired student t test and Cohen’s kappa scores. For link prediction on Aviation Knowledge Graph using T5-small and T5-large, we obtain a Cohen’s kappa score of 0.76, showing substantial agreement between the models. Thus, we infer that small language models perform similar to large language models with the infusion of knowledge.

Knowledge Enhanced Deep Learning Model for Radiology Text Generation
Kaveri Kale | Pushpak Bhattacharya | Aditya Shetty | Milind Gune | Kush Shrivastava | Rustom Lawyer | Spriha Biswas
Proceedings of the 19th International Conference on Natural Language Processing (ICON)

Manual radiology report generation is a time-consuming task. First, radiologists prepare brief notes while carefully examining the imaging report. Then, radiologists or their secretaries create a full-text report that describes the findings by referring to the notes. Automatic radiology report generation is the primary objective of this research. The central part of automatic radiology report generation is generating the finding section (main body of the report) from the radiologists’ notes. In this research, we suggest a knowledge graph (KG) enhanced radiology text generator that can provide additional domain-specific information. Our approach uses a KG-BART model to generate a description of clinical findings (referred to as pathological description) from radiologists’ brief notes. We have constructed a parallel dataset of radiologists’ notes and corresponding pathological descriptions to train the KG-BART model. Our findings demonstrate that, compared to the BART-large and T5-large models, the BLEU-2 score of the pathological descriptions generated by our approach is raised by 4% and 9%, and the ROUGE-L score by 2% and 2%, respectively. Our analysis shows that the KG-BART model for radiology text generation outperforms the T5-large model. Furthermore, we apply our proposed radiology text generator for whole radiology report generation.

Zero-shot Disfluency Detection for Indian Languages
Rohit Kundu | Preethi Jyothi | Pushpak Bhattacharyya
Proceedings of the 29th International Conference on Computational Linguistics

Disfluencies that appear in the transcriptions from automatic speech recognition systems tend to impair the performance of downstream NLP tasks. Disfluency correction models can help alleviate this problem. However, the unavailability of labeled data in low-resource languages impairs progress. We propose using a pretrained multilingual model, finetuned only on English disfluencies, for zero-shot disfluency detection in Indian languages. We present a detailed pipeline to synthetically generate disfluent text and create evaluation datasets for four Indian languages: Bengali, Hindi, Malayalam, and Marathi. Even in the zero-shot setting, we obtain F1 scores of 75 and higher on five disfluency types across all four languages. We also show the utility of synthetically generated disfluencies by evaluating on real disfluent text in Bengali, Hindi, and Marathi. Finetuning the multilingual model on additional synthetic Hindi disfluent text nearly doubles the number of exact matches and yields a 20-point boost in F1 scores when evaluated on real Hindi disfluent text, compared to training with only English disfluent text.

Verb Phrase Anaphora:Do(ing) so with Heuristics
Sandhya Singh | Kushagra Shree | Sriparna Saha | Pushpak Bhattacharyya | Gladvin Chinnadurai | Manish Vatsa
Proceedings of the 19th International Conference on Natural Language Processing (ICON)

Verb Phrase Anaphora (VPA) is a universal language phenomenon. It can occur in the form of do so phrase, verb phrase ellipsis, etc. Resolving VPA can improve the performance of Dialogue processing systems, Natural Language Generation (NLG), Question Answering (QA) and so on. In this paper, we present a novel computational approach to resolve the specific verb phrase anaphora appearing as do so construct and its lexical variations for the English language. The approach follows a heuristic technique using a combination of parsing from classical NLP, state-of-the-art (SOTA) Generative Pre-trained Transformer (GPT) language model and RoBERTa grammar correction model. The result indicates that our approach can resolve these specific verb phrase anaphora cases with 73.40 F1 score. The data set used for testing the specific verb phrase anaphora cases of do so and doing so is released for research purposes. This module has been used as the last module in a coreference resolution pipeline for a downstream QA task for the electronic home appliances sector.

EmoInHindi: A Multi-label Emotion and Intensity Annotated Dataset in Hindi for Emotion Recognition in Dialogues
Gopendra Vikram Singh | Priyanshu Priya | Mauajama Firdaus | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The long-standing goal of Artificial Intelligence (AI) has been to create human-like conversational systems. Such systems should have the ability to develop an emotional connection with the users, consequently, emotion recognition in dialogues has gained popularity. Emotion detection in dialogues is a challenging task because humans usually convey multiple emotions with varying degrees of intensities in a single utterance. Moreover, emotion in an utterance of a dialogue may be dependent on previous utterances making the task more complex. Recently, emotion recognition in low-resource languages like Hindi has been in great demand. However, most of the existing datasets for multi-label emotion and intensity detection in conversations are in English. To this end, we propose a large conversational dataset in Hindi named EmoInHindi for multi-label emotion and intensity recognition in conversations containing 1,814 dialogues with a total of 44,247 utterances. We prepare our dataset in a Wizard-of-Oz manner for mental health and legal counselling of crime victims. Each utterance of dialogue is annotated with one or more emotion categories from 16 emotion labels including neutral and their corresponding intensity. We further propose strong contextual baselines that can detect the emotion(s) and corresponding emotional intensity of an utterance given the conversational context.

A Multimodal Corpus for Emotion Recognition in Sarcasm
Anupama Ray | Shubham Mishra | Apoorva Nunna | Pushpak Bhattacharyya
Proceedings of the Thirteenth Language Resources and Evaluation Conference

While sentiment and emotion analysis have been studied extensively, the relationship between sarcasm and emotion has largely remained unexplored. A sarcastic expression may have a variety of underlying emotions. For example, “I love being ignored” belies sadness, while “my mobile is fabulous with a battery backup of only 15 minutes!” expresses frustration. Detecting the emotion behind a sarcastic expression is non-trivial yet an important task. We undertake the task of detecting the emotion in a sarcastic statement, which to the best of our knowledge, is hitherto unexplored. We start with the recently released multimodal sarcasm detection dataset (MUStARD) pre-annotated with 9 emotions. We identify and correct 343 incorrect emotion labels (out of 690). We double the size of the dataset, label it with emotions along with valence and arousal which are important indicators of emotional intensity. Finally, we label each sarcastic utterance with one of the four sarcasm types-Propositional, Embedded, Likeprefixed and Illocutionary, with the goal of advancing sarcasm detection research. Exhaustive experimentation with multimodal (text, audio, and video) fusion models establishes a benchmark for exact emotion recognition in sarcasm and outperforms the state-of-art sarcasm detection. We release the dataset enriched with various annotations and the code for research purposes: https://github.com/apoorva-nunna/MUStARD_Plus_Plus

Multiple Pivot Languages and Strategic Decoder Initialization Helps Neural Machine Translation
Shivam Mhaskar | Pushpak Bhattacharyya
Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022)

In machine translation, a pivot language can be used to assist the source to target translation model. In pivot-based transfer learning, the source to pivot and the pivot to target models are used to improve the performance of the source to target model. This technique works best when both source-pivot and pivot-target are high resource language pairs and the source-target is a low resource language pair. But in some cases, such as Indic languages, the pivot to target language pair is not a high resource one. To overcome this limitation, we use multiple related languages as pivot languages to assist the source to target model. We show that using multiple pivot languages gives 2.03 BLEU and 3.05 chrF score improvement over the baseline model. We show that strategic decoder initialization while performing pivot-based transfer learning with multiple pivot languages gives a 3.67 BLEU and 5.94 chrF score improvement over the baseline model.

IIT Bombay’s WMT22 Automatic Post-Editing Shared Task Submission
Sourabh Deoghare | Pushpak Bhattacharyya
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper describes IIT Bombay’s submission to the WMT22 Automatic Post-Editing (APE) shared task for the English-Marathi (En-Mr) language pair. We follow the curriculum training strategy to train our APE system. First, we train an encoder-decoder model to perform translation from English to Marathi. Next, we add another encoder to the model and train the resulting dual-encoder single-decoder model for the APE task. This involves training the model using the synthetic APE data in multiple training stages and then fine-tuning it using the real APE data. We use the LaBSE technique to ensure the quality of the synthetic APE data. For data augmentation, along with using candidates obtained from an external machine translation (MT) system, we also use the phrase-level APE triplets generated using phrase table injection. As APE systems are prone to the problem of ‘over-correction’, we use a sentence-level quality estimation (QE) system to select the final output between an original translation and the corresponding output generated by the APE model. Our approach improves the TER and BLEU scores on the development set by -3.92 and +4.36 points, respectively. Also, the final results on the test set show that our APE system outperforms the baseline system by -3.49 TER points and +5.37 BLEU points.

HindiMD: A Multi-domain Corpora for Low-resource Sentiment Analysis
Mamta | Asif Ekbal | Pushpak Bhattacharyya | Tista Saha | Alka Kumar | Shikha Srivastava
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Social media platforms such as Twitter have evolved into a vast information sharing platform, allowing people from a variety of backgrounds and expertise to share their opinions on numerous events such as terrorism, narcotics and many other social issues. People sometimes misuse the power of social media for their agendas, such as illegal trades and negatively influencing others. Because of this, sentiment analysis has won the interest of a lot of researchers to widely analyze public opinion for social media monitoring. Several benchmark datasets for sentiment analysis across a range of domains have been made available, especially for high-resource languages. A few datasets are available for low-resource Indian languages like Hindi, such as movie reviews and product reviews, which do not address the current need for social media monitoring. In this paper, we address the challenges of sentiment analysis in Hindi and socially relevant domains by introducing a balanced corpus annotated with the sentiment classes, viz. positive, negative and neutral. To show the effective usage of the dataset, we build several deep learning based models and establish them as the baselines for further research in this direction.

HiNER: A large Hindi Named Entity Recognition Dataset
Rudra Murthy | Pallab Bhattacharjee | Rahul Sharnagat | Jyotsana Khatri | Diptesh Kanojia | Pushpak Bhattacharyya
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Named Entity Recognition (NER) is a foundational NLP task that aims to provide class labels like Person, Location, Organisation, Time, and Number to words in free text. Named Entities can also be multi-word expressions where the additional I-O-B annotation information helps label them during the NER annotation process. While English and European languages have considerable annotated data for the NER task, Indian languages lack on that front- both in terms of quantity and following annotation standards. This paper releases a significantly sized standard-abiding Hindi NER dataset containing 109,146 sentences and 2,220,856 tokens, annotated with 11 tags. We discuss the dataset statistics in all their essential detail and provide an in-depth analysis of the NER tag-set used with our data. The statistics of tag-set in our dataset shows a healthy per-tag distribution especially for prominent classes like Person, Location and Organisation. Since the proof of resource-effectiveness is in building models with the resource and testing the model on benchmark data and against the leader-board entries in shared tasks, we do the same with the aforesaid data. We use different language models to perform the sequence labelling task for NER and show the efficacy of our data by performing a comparative evaluation with models trained on another dataset available for the Hindi NER task. Our dataset helps achieve a weighted F1 score of 88.78 with all the tags and 92.22 when we collapse the tag-set, as discussed in the paper. To the best of our knowledge, no available dataset meets the standards of volume (amount) and variability (diversity), as far as Hindi NER is concerned. We fill this gap through this work, which we hope will significantly help NLP for Hindi. We release this dataset with our code and models for further research at https://github.com/cfiltnlp/HiNER

Team IITP-AINLPML at WASSA 2022: Empathy Detection, Emotion Classification and Personality Detection
Soumitra Ghosh | Dhirendra Maurya | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis

Computational comprehension and identifying emotional components in language have been critical in enhancing human-computer connection in recent years. The WASSA 2022 Shared Task introduced four tracks and released a dataset of news stories: Track-1 for Empathy and Distress Prediction, Track-2 for Emotion classification, Track-3 for Personality prediction, and Track-4 for Interpersonal Reactivity Index prediction at the essay level. This paper describes our participation in the WASSA 2022 shared task on the tasks mentioned above. We developed multi-task deep learning methods to address Tracks 1 and 2 and machine learning models for Track 3 and 4. Our developed systems achieved average Pearson scores of 0.483, 0.05, and 0.08 for Track 1, 3, and 4, respectively, and a macro F1 score of 0.524 for Track 2 on the test set. We ranked 8th, 11th, 2nd and 2nd for tracks 1, 2, 3, and 4 respectively.

Many Hands Make Light Work: Using Essay Traits to Automatically Score Essays
Rahul Kumar | Sandeep Mathias | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Most research in the area of automatic essay grading (AEG) is geared towards scoring the essay holistically while there has also been little work done on scoring individual essay traits. In this paper, we describe a way to score essays using a multi-task learning (MTL) approach, where scoring the essay holistically is the primary task, and scoring the essay traits is the auxiliary task. We compare our results with a single-task learning (STL) approach, using both LSTMs and BiLSTMs. To find out which traits work best for different types of essays, we conduct ablation tests for each of the essay traits. We also report the runtime and number of training parameters for each system. We find that MTL-based BiLSTM system gives the best results for scoring the essay holistically, as well as performing well on scoring the essay traits. The MTL systems also give a speed-up of between 2.30 to 3.70 times the speed of the STL system, when it comes to scoring the essay and all the traits.

Detecting Unintended Social Bias in Toxic Language Datasets
Nihar Sahoo | Himanshu Gupta | Pushpak Bhattacharyya
Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)

With the rise of online hate speech, automatic detection of Hate Speech, Offensive texts as a natural language processing task is getting popular. However, very little research has been done to detect unintended social bias from these toxic language datasets. This paper introduces a new dataset ToxicBias curated from the existing dataset of Kaggle competition named “Jigsaw Unintended Bias in Toxicity Classification”. We aim to detect social biases, their categories, and targeted groups. The dataset contains instances annotated for five different bias categories, viz., gender, race/ethnicity, religion, political, and LGBTQ. We train transformer-based models using our curated datasets and report baseline performance for bias identification, target generation, and bias implications. Model biases and their mitigation are also discussed in detail. Our study motivates a systematic extraction of social bias data from toxic language datasets.

PoliSe: Reinforcing Politeness Using User Sentiment for Customer Care Response Generation
Mauajama Firdaus | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 29th International Conference on Computational Linguistics

The interaction between a consumer and the customer service representative greatly contributes to the overall customer experience. Therefore, to ensure customers’ comfort and retention, it is important that customer service agents and chatbots connect with users on social, cordial, and empathetic planes. In the current work, we automatically identify the sentiment of the user and transform the neutral responses into polite responses conforming to the sentiment and the conversational history. Our technique is basically a reinforced multi-task network- the primary task being ‘polite response generation’ and the secondary task being ‘sentiment analysis’- that uses a Transformer based encoder-decoder. We use sentiment annotated conversations from Twitter as the training data. The detailed evaluation shows that our proposed approach attains superior performance compared to the baseline models.

Meta-Learning based Deferred Optimisation for Sentiment and Emotion aware Multi-modal Dialogue Act Classification
Tulika Saha | Aditya Prakash Patra | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Dialogue Act Classification (DAC) that determines the communicative intention of an utterance has been investigated widely over the years as a standalone task. But the emotional state of the speaker has a considerable effect on its pragmatic content. Sentiment as a human behavior is also closely related to emotion and one aids in the better understanding of the other. Thus, their role in identification of DAs needs to be explored. As a first step, we extend the newly released multi-modal EMOTyDA dataset to enclose sentiment tags for each utterance. In order to incorporate these multiple aspects, we propose a Dual Attention Mechanism (DAM) based multi-modal, multi-tasking conversational framework. The DAM module encompasses intra-modal and interactive inter-modal attentions with multiple loss optimization at various hierarchies to fuse multiple modalities efficiently and learn generalized features across all the tasks. Additionally, to counter the class-imbalance issue in dialogues, we introduce a 2-step Deferred Optimisation Schedule (DOS) that involves Meta-Net (MN) learning and deferred re-weighting where the former helps to learn an explicit weighting function from data automatically and the latter deploys a re-weighted multi-task loss with a smaller learning rate. Empirically, we establish that the joint optimisation of multi-modal DAC, SA and ER tasks along with the incorporation of 2-step DOS and MN learning produces better results compared to its different counterparts and outperforms state-of-the-art model.

A Shoulder to Cry on: Towards A Motivational Virtual Assistant for Assuaging Mental Agony
Tulika Saha | Saichethan Reddy | Anindya Das | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Mental Health Disorders continue plaguing humans worldwide. Aggravating this situation is the severe shortage of qualified and competent mental health professionals (MHPs), which underlines the need for developing Virtual Assistants (VAs) that can assist MHPs. The data+ML for automation can come from platforms that allow visiting and posting messages in peer-to-peer anonymous manner for sharing their experiences (frequently stigmatized) and seeking support. In this paper, we propose a VA that can act as the first point of contact and comfort for mental health patients. We curate a dataset, Motivational VA: MotiVAte comprising of 7k dyadic conversations collected from a peer-to-peer support platform. The system employs two mechanisms: (i) Mental Illness Classification: an attention based BERT classifier that outputs the mental disorder category out of the 4 categories, viz., Major Depressive Disorder (MDD), Anxiety, Obsessive Compulsive Disorder (OCD) and Post-traumatic Stress Disorder (PTSD), based on the input ongoing dialog between the support seeker and the VA; and (ii) Mental Illness Conditioned Motivational Dialogue Generation (MI-MDG): a sentiment driven Reinforcement Learning (RL) based motivational response generator. The empirical evaluation demonstrates the system capability by way of outperforming several baselines.

A Deep Learning based Framework for Image Paragraph Generation in Hindi
Santosh Kumar Mishra | Sushant Sinha | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

Affective Retrofitted Word Embeddings
Sapan Shah | Sreedhar Reddy | Pushpak Bhattacharyya
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Word embeddings learned using the distributional hypothesis (e.g., GloVe, Word2vec) do not capture the affective dimensions of valence, arousal, and dominance, which are present inherently in words. We present a novel retrofitting method for updating embeddings of words for their affective meaning. It learns a non-linear transformation function that maps pre-trained embeddings to an affective vector space, in a representation learning setting. We investigate word embeddings for their capacity to cluster emotion-bearing words. The affective embeddings learned by our method achieve better inter-cluster and intra-cluster distance for words having the same emotions, as evaluated through different cluster quality metrics. For the downstream tasks on sentiment analysis and sarcasm detection, simple classification models, viz. SVM and Attention Net, learned using our affective embeddings perform better than their pre-trained counterparts (more than 1.5% improvement in F1-score) and other benchmarks. Furthermore, the difference in performance is more pronounced in limited data setting.

A Sentiment and Emotion Aware Multimodal Multiparty Humor Recognition in Multilingual Conversational Setting
Dushyant Singh Chauhan | Gopendra Vikram Singh | Aseem Arora | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 29th International Conference on Computational Linguistics

In this paper, we hypothesize that humor is closely related to sentiment and emotions. Also, due to the tremendous growth in multilingual content, there is a great demand for building models and systems that support multilingual information access. To end this, we first extend the recently released Multimodal Multiparty Hindi Humor (M2H2) dataset by adding parallel English utterances corresponding to Hindi utterances and then annotating each utterance with sentiment and emotion classes. We name it Sentiment, Humor, and Emotion aware Multilingual Multimodal Multiparty Dataset (SHEMuD). Therefore, we propose a multitask framework wherein the primary task is humor detection, and the auxiliary tasks are sentiment and emotion identification. We design a multitasking framework wherein we first propose a Context Transformer to capture the deep contextual relationships with the input utterances. We then propose a Sentiment and Emotion aware Embedding (SE-Embedding) to get the overall representation of a particular emotion and sentiment w.r.t. the specific humor situation. Experimental results on the SHEMuD show the efficacy of our approach and shows that multitask learning offers an improvement over the single-task framework for both monolingual (4.86 points in Hindi and 5.9 points in English in F1-score) and multilingual (5.17 points in F1-score) setting.

Novelty Detection: A Perspective from Natural Language Processing
Tirthankar Ghosal | Tanik Saikh | Tameesh Biswas | Asif Ekbal | Pushpak Bhattacharyya
Computational Linguistics, Volume 48, Issue 1 - March 2022

The quest for new information is an inborn human trait and has always been quintessential for human survival and progress. Novelty drives curiosity, which in turn drives innovation. In Natural Language Processing (NLP), Novelty Detection refers to finding text that has some new information to offer with respect to whatever is earlier seen or known. With the exponential growth of information all across the Web, there is an accompanying menace of redundancy. A considerable portion of the Web contents are duplicates, and we need efficient mechanisms to retain new information and filter out redundant information. However, detecting redundancy at the semantic level and identifying novel text is not straightforward because the text may have less lexical overlap yet convey the same information. On top of that, non-novel/redundant information in a document may have assimilated from multiple source documents, not just one. The problem surmounts when the subject of the discourse is documents, and numerous prior documents need to be processed to ascertain the novelty/non-novelty of the current one in concern. In this work, we build upon our earlier investigations for document-level novelty detection and present a comprehensive account of our efforts toward the problem. We explore the role of pre-trained Textual Entailment (TE) models to deal with multiple source contexts and present the outcome of our current investigations. We argue that a multipremise entailment task is one close approximation toward identifying semantic-level non-novelty. Our recent approach either performs comparably or achieves significant improvement over the latest reported results on several datasets and across several related tasks (paraphrasing, plagiarism, rewrite). We critically analyze our performance with respect to the existing state of the art and show the superiority and promise of our approach for future investigations. We also present our enhanced dataset TAP-DLND 2.0 and several baselines to the community for further research on document-level novelty detection.

2021

Disfluency Correction using Unsupervised and Semi-supervised Learning
Nikhil Saini | Drumil Trivedi | Shreya Khare | Tejas Dhamecha | Preethi Jyothi | Samarth Bharadwaj | Pushpak Bhattacharyya
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Spoken language is different from the written language in its style and structure. Disfluencies that appear in transcriptions from speech recognition systems generally hamper the performance of downstream NLP tasks. Thus, a disfluency correction system that converts disfluent to fluent text is of great value. This paper introduces a disfluency correction model that translates disfluent to fluent text by drawing inspiration from recent encoder-decoder unsupervised style-transfer models for text. We also show considerable benefits in performance when utilizing a small sample of 500 parallel disfluent-fluent sentences in a semi-supervised way. Our unsupervised approach achieves a BLEU score of 79.39 on the Switchboard corpus test set, with further improvement to a BLEU score of 85.28 with semi-supervision. Both are comparable to two competitive fully-supervised models.

Evaluating the Performance of Back-translation for Low Resource English-Marathi Language Pair: CFILT-IITBombay @ LoResMT 2021
Aditya Jain | Shivam Mhaskar | Pushpak Bhattacharyya
Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)

In this paper, we discuss the details of the various Machine Translation (MT) systems that we have submitted for the English-Marathi LoResMT task. As a part of this task, we have submitted three different Neural Machine Translation (NMT) systems; a Baseline English-Marathi system, a Baseline Marathi-English system, and an English-Marathi system that is based on the back-translation technique. We explore the performance of these NMT systems between English and Marathi languages, which forms a low resource language pair due to unavailability of sufficient parallel data. We also explore the performance of the back-translation technique when the back-translated data is obtained from NMT systems that are trained on a very less amount of data. From our experiments, we observe that the back-translation technique can help improve the MT quality over the baseline for the English-Marathi language pair.

Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Sivaji Bandyopadhyay | Sobha Lalitha Devi | Pushpak Bhattacharyya
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Retrofitting of Pre-trained Emotion Words with VAD-dimensions and the Plutchik Emotions
Manasi Kulkarni | Pushpak Bhattacharyya
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

The word representations are based on distributional hypothesis according to which words that occur in the similar contexts, tend to have a similar meaning and appear closer in vector space. For example, the emotionally dissimilar words ”joy” and ”sadness” have higher cosine similarity. The existing pre-trained embedding models lack in emotional words interpretations. For creating our VAD-Emotion embeddings, we modify the pre-trained word embeddings with emotion information. This is a lexicons based approach that uses the Valence, Arousal and Dominance (VAD) values, and the Plutchik’s emotions to incorporate the emotion information in pre-trained word embeddings using post-training processing. This brings emotionally similar words nearer and emotionally dissimilar words away from each other in the proposed vector space. We demonstrate the performance of proposed embedding through NLP downstream task - Emotion Recognition.

Cognition-aware Cognate Detection
Diptesh Kanojia | Prashant Sharma | Sayali Ghodekar | Pushpak Bhattacharyya | Gholamreza Haffari | Malhar Kulkarni
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Automatic detection of cognates helps downstream NLP tasks of Machine Translation, Cross-lingual Information Retrieval, Computational Phylogenetics and Cross-lingual Named Entity Recognition. Previous approaches for the task of cognate detection use orthographic, phonetic and semantic similarity based features sets. In this paper, we propose a novel method for enriching the feature sets, with cognitive features extracted from human readers’ gaze behaviour. We collect gaze behaviour data for a small sample of cognates and show that extracted cognitive features help the task of cognate detection. However, gaze data collection and annotation is a costly task. We use the collected gaze behaviour data to predict cognitive features for a larger sample and show that predicted cognitive features, also, significantly improve the task performance. We report improvements of 10% with the collected gaze features, and 12% using the predicted gaze features, over the previously proposed approaches. Furthermore, we release the collected gaze behaviour data along with our code and cross-lingual models.

EduMT: Developing Machine Translation System for Educational Content in Indian Languages
Ramakrishna Appicharla | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

In this paper, we explore various approaches to build Hindi to Bengali Neural Machine Translation (NMT) systems for the educational domain. Translation of educational content poses several challenges, such as unavailability of gold standard data for model building, extensive uses of domain-specific terms, as well as the presence of noise in the form of spontaneous speech as the corpus is prepared from subtitle data and noise due to the process of corpus creation through back-translation. We create an educational parallel corpus by crawling lecture subtitles and translating them into Hindi and Bengali using Google translate. We also create a clean parallel corpus by post-editing synthetic corpus via annotation and crowd-sourcing. We build NMT systems on the prepared corpus with domain adaptation objectives. We also explore data augmentation methods by automatically cleaning synthetic corpus and using it to further train the models. We experiment with combining domain adaptation objective with multilingual NMT. We report BLEU and TER scores of all the models on a manually created Hindi-Bengali educational testset. Our experiments show that the multilingual domain adaptation model outperforms all the other models by achieving 34.8 BLEU and 0.466 TER scores.

Investigating Active Learning in Interactive Neural Machine Translation
Kamal Gupta | Dhanvanth Boppana | Rejwanul Haque | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of Machine Translation Summit XVIII: Research Track

Interactive-predictive translation is a collaborative iterative process and where human translators produce translations with the help of machine translation (MT) systems interactively. Various sampling techniques in active learning (AL) exist to update the neural MT (NMT) model in the interactive-predictive scenario. In this paper and we explore term based (named entity count (NEC)) and quality based (quality estimation (QE) and sentence similarity (Sim)) sampling techniques – which are used to find the ideal candidates from the incoming data – for human supervision and MT model’s weight updation. We carried out experiments with three language pairs and viz. German-English and Spanish-English and Hindi-English. Our proposed sampling technique yields 1.82 and 0.77 and 0.81 BLEU points improvements for German-English and Spanish-English and Hindi-English and respectively and over random sampling based baseline. It also improves the present state-of-the-art by 0.35 and 0.12 BLEU points for German-English and Spanish-English and respectively. Human editing effort in terms of number-of-words-changed also improves by 5 and 4 points for German-English and Spanish-English and respectively and compared to the state-of-the-art.

CFILT IIT Bombay@LT-EDI-EACL2021: Hope Speech Detection for Equality, Diversity, and Inclusion using Multilingual Representation fromTransformers
Pankaj Singh | Prince Kumar | Pushpak Bhattacharyya
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion

With the internet becoming part and parcel of our lives, engagement in social media has increased a lot. Identifying and eliminating offensive content from social media has become of utmost priority to prevent any kind of violence. However, detecting encouraging, supportive and positive content is equally important to prevent misuse of censorship targeted to attack freedom of speech. This paper presents our system for the shared task Hope Speech Detection for Equality, Diversity, and Inclusion at LT-EDI, EACL 2021. The data for this shared task is provided in English, Tamil, and Malayalam which was collected from YouTube comments. It is a multiclass classification problem where each data instance is categorized into one of the three classes: ‘Hope speech’, ‘Not hope speech’, and ‘Not in intended language’. We propose a system that employs multilingual transformer models to obtain the representation of text and classifies it into one of the three classes. We explored the use of multilingual models trained specifically for Indian languages along with generic multilingual models. Our system was ranked 2nd for English, 2nd for Malayalam, and 7th for the Tamil language in the final leader board published by organizers and obtained a weighted F1-score of 0.92, 0.84, 0.55 respectively on the hidden test dataset used for the competition. We have made our system publicly available at GitHub.

BERT based Adverse Drug Effect Tweet Classification
Tanay Kayastha | Pranjal Gupta | Pushpak Bhattacharyya
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

This paper describes models developed for the Social Media Mining for Health (SMM4H) 2021 shared tasks. Our team participated in the first subtask that classifies tweets with Adverse Drug Effect (ADE) mentions. Our best performing model utilizes BERTweet followed by a single layer of BiLSTM. The system achieves an F-score of 0.45 on the test set without the use of any auxiliary resources such as Part-of-Speech tags, dependency tags, or knowledge from medical dictionaries.

A Scaled Encoder Decoder Network for Image Captioning in Hindi
Santosh Kumar Mishra | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Image captioning is a prominent research area in computer vision and natural language processing, which automatically generates natural language descriptions for images. Most of the existing works have focused on developing models for image captioning in the English language. The current paper introduces a novel deep learning architecture based on encoder-decoder with an attention mechanism for image captioning in the Hindi language. For encoder, decoder, and attention, several deep learning-based architectures have been explored. Hindi, the fourth-most spoken language globally, is widely spoken in India and South Asia and is one of India’s official languages. The proposed encoder-decoder architecture utilizes scaling in convolution neural networks to achieve better accuracy than state-of-the-art image captioning methods in Hindi. The proposed method’s performance is compared with state-of-the-art methods in terms of BLEU scores and manual evaluation (in terms of adequacy and fluency). The obtained results demonstrate the efficacy of the proposed method.

Neural Machine Translation in Low-Resource Setting: a Case Study in English-Marathi Pair
Aakash Banerjee | Aditya Jain | Shivam Mhaskar | Sourabh Deoghare | Aman Sehgal | Pushpak Bhattacharyya
Proceedings of Machine Translation Summit XVIII: Research Track

In this paper and we explore different techniques of overcoming the challenges of low-resource in Neural Machine Translation (NMT) and specifically focusing on the case of English-Marathi NMT. NMT systems require a large amount of parallel corpora to obtain good quality translations. We try to mitigate the low-resource problem by augmenting parallel corpora or by using transfer learning. Techniques such as Phrase Table Injection (PTI) and back-translation and mixing of language corpora are used for enhancing the parallel data; whereas pivoting and multilingual embeddings are used to leverage transfer learning. For pivoting and Hindi comes in as assisting language for English-Marathi translation. Compared to baseline transformer model and a significant improvement trend in BLEU score is observed across various techniques. We have done extensive manual and automatic and qualitative evaluation of our systems. Since the trend in Machine Translation (MT) today is post-editing and measuring of Human Effort Reduction (HER) and we have given our preliminary observations on Translation Edit Rate (TER) vs. BLEU score study and where TER is regarded as a measure of HER.

How low is too low? A monolingual take on lemmatisation in Indian languages
Kumar Saunack | Kumar Saurav | Pushpak Bhattacharyya
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Lemmatization aims to reduce the sparse data problem by relating the inflected forms of a word to its dictionary form. Most prior work on ML based lemmatization has focused on high resource languages, where data sets (word forms) are readily available. For languages which have no linguistic work available, especially on morphology or in languages where the computational realization of linguistic rules is complex and cumbersome, machine learning based lemmatizers are the way togo. In this paper, we devote our attention to lemmatisation for low resource, morphologically rich scheduled Indian languages using neural methods. Here, low resource means only a small number of word forms are available. We perform tests to analyse the variance in monolingual models’ performance on varying the corpus size and contextual morphological tag data for training. We show that monolingual approaches with data augmentation can give competitive accuracy even in the low resource setting, which augurs well for NLP in low resource setting.

FrameNet-assisted Noun Compound Interpretation
Girishkumar Ponkiya | Diptesh Kanojia | Pushpak Bhattacharyya | Girish Palshikar
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Scrambled Translation Problem: A Problem of Denoising UNMT
Tamali Banerjee | Rudra V Murthy | Pushpak Bhattacharya
Proceedings of Machine Translation Summit XVIII: Research Track

In this paper and we identify an interesting kind of error in the output of Unsupervised Neural Machine Translation (UNMT) systems like Undreamt1. We refer to this error type as Scrambled Translation problem. We observe that UNMT models which use word shuffle noise (as in case of Undreamt) can generate correct words and but fail to stitch them together to form phrases. As a result and words of the translated sentence look scrambled and resulting in decreased BLEU. We hypothesise that the reason behind scrambled translation problem is ’shuffling noise’ which is introduced in every input sentence as a denoising strategy. To test our hypothesis and we experiment by retraining UNMT models with a simple retraining strategy. We stop the training of the Denoising UNMT model after a pre-decided number of iterations and resume the training for the remaining iterations- which number is also pre-decided- using original sentence as input without adding any noise. Our proposed solution achieves significant performance improvement UNMT models that train conventionally. We demonstrate these performance gains on four language pairs and viz. and English-French and English-German and English-Spanish and Hindi-Punjabi. Our qualitative and quantitative analysis shows that the retraining strategy helps achieve better alignment as observed by attention heatmap and better phrasal translation and leading to statistically significant improvement in BLEU scores.

Pivot Based Transfer Learning for Neural Machine Translation: CFILT IITB @ WMT 2021 Triangular MT
Shivam Mhaskar | Pushpak Bhattacharyya
Proceedings of the Sixth Conference on Machine Translation

In this paper, we discuss the various techniques that we used to implement the Russian-Chinese machine translation system for the Triangular MT task at WMT 2021. Neural Machine translation systems based on transformer architecture have an encoder-decoder architecture, which are trained end-to-end and require a large amount of parallel corpus to produce good quality translations. This is the reason why neural machine translation systems are referred to as data hungry. Such a large amount of parallel corpus is majorly available for language pairs which include English and not for non-English language pairs. This is a major problem in building neural machine translation systems for non-English language pairs. We try to utilize the resources of the English language to improve the translation of non-English language pairs. We use the pivot language, that is English, to leverage transfer learning to improve the quality of Russian-Chinese translation. Compared to the baseline transformer-based neural machine translation system, we observe that the pivot language-based transfer learning technique gives a higher BLEU score.

Language Relatedness and Lexical Closeness can help Improve Multilingual NMT: IITBombay@MultiIndicNMT WAT2021
Jyotsana Khatri | Nikhil Saini | Pushpak Bhattacharyya
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

Multilingual Neural Machine Translation has achieved remarkable performance by training a single translation model for multiple languages. This paper describes our submission (Team ID: CFILT-IITB) for the MultiIndicMT: An Indic Language Multilingual Task at WAT 2021. We train multilingual NMT systems by sharing encoder and decoder parameters with language embedding associated with each token in both encoder and decoder. Furthermore, we demonstrate the use of transliteration (script conversion) for Indic languages in reducing the lexical gap for training a multilingual NMT system. Further, we show improvement in performance by training a multilingual NMT system using languages of the same family, i.e., related languages.

“So You Think You’re Funny?”: Rating the Humour Quotient in Standup Comedy
Anirudh Mittal | Pranav Jeevan P | Prerak Gandhi | Diptesh Kanojia | Pushpak Bhattacharyya
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Computational Humour (CH) has attracted the interest of Natural Language Processing and Computational Linguistics communities. Creating datasets for automatic measurement of humour quotient is difficult due to multiple possible interpretations of the content. In this work, we create a multi-modal humour-annotated dataset (~40 hours) using stand-up comedy clips. We devise a novel scoring mechanism to annotate the training data with a humour quotient score using the audience’s laughter. The normalized duration (laughter duration divided by the clip duration) of laughter in each clip is used to compute this humour coefficient score on a five-point scale (0-4). This method of scoring is validated by comparing with manually annotated scores, wherein a quadratic weighted kappa of 0.6 is obtained. We use this dataset to train a model that provides a ‘funniness’ score, on a five-point scale, given the audio and its corresponding text. We compare various neural language models for the task of humour-rating and achieve an accuracy of 0.813 in terms of Quadratic Weighted Kappa (QWK). Our ‘Open Mic’ dataset is released for further research along with the code.

Language Model Pretraining and Transfer Learning for Very Low Resource Languages
Jyotsana Khatri | Rudra Murthy | Pushpak Bhattacharyya
Proceedings of the Sixth Conference on Machine Translation

This paper describes our submission for the shared task on Unsupervised MT and Very Low Resource Supervised MT at WMT 2021. We submitted systems for two language pairs: German ↔ Upper Sorbian (de ↔ hsb) and German-Lower Sorbian (de ↔ dsb). For de ↔ hsb, we pretrain our system using MASS (Masked Sequence to Sequence) objective and then finetune using iterative back-translation. Final finetunng is performed using the parallel data provided for translation objective. For de ↔ dsb, no parallel data is provided in the task, we use final de ↔ hsb model as initialization of the de ↔ dsb model and train it further using iterative back-translation, using the same vocabulary as used in the de ↔ hsb model.

Multilingual Machine Translation Systems at WAT 2021: One-to-Many and Many-to-One Transformer based NMT
Shivam Mhaskar | Aditya Jain | Aakash Banerjee | Pushpak Bhattacharyya
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

In this paper, we present the details of the systems that we have submitted for the WAT 2021 MultiIndicMT: An Indic Language Multilingual Task. We have submitted two separate multilingual NMT models: one for English to 10 Indic languages and another for 10 Indic languages to English. We discuss the implementation details of two separate multilingual NMT approaches, namely one-to-many and many-to-one, that makes use of a shared decoder and a shared encoder, respectively. From our experiments, we observe that the multilingual NMT systems outperforms the bilingual baseline MT systems for each of the language pairs under consideration.

Modelling Context Emotions using Multi-task Learning for Emotion Controlled Dialog Generation
Deeksha Varshney | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

A recent topic of research in natural language generation has been the development of automatic response generation modules that can automatically respond to a user’s utterance in an empathetic manner. Previous research has tackled this task using neural generative methods by augmenting emotion classes with the input sequences. However, the outputs by these models may be inconsistent. We employ multi-task learning to predict the emotion label and to generate a viable response for a given utterance using a common encoder with multiple decoders. Our proposed encoder-decoder model consists of a self-attention based encoder and a decoder with dot product attention mechanism to generate response with a specified emotion. We use the focal loss to handle imbalanced data distribution, and utilize the consistency loss to allow coherent decoding by the decoders. Human evaluation reveals that our model produces more emotionally pertinent responses. In addition, our model outperforms multiple strong baselines on automatic evaluation measures such as F1 and BLEU scores, thus resulting in more fluent and adequate responses.

Role of Language Relatedness in Multilingual Fine-tuning of Language Models: A Case Study in Indo-Aryan Languages
Tejas Dhamecha | Rudra Murthy | Samarth Bharadwaj | Karthik Sankaranarayanan | Pushpak Bhattacharyya
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We explore the impact of leveraging the relatedness of languages that belong to the same family in NLP models using multilingual fine-tuning. We hypothesize and validate that multilingual fine-tuning of pre-trained language models can yield better performance on downstream NLP applications, compared to models fine-tuned on individual languages. A first of its kind detailed study is presented to track performance change as languages are added to a base language in a graded and greedy (in the sense of best boost of performance) manner; which reveals that careful selection of subset of related languages can significantly improve performance than utilizing all related languages. The Indo-Aryan (IA) language family is chosen for the study, the exact languages being Bengali, Gujarati, Hindi, Marathi, Oriya, Punjabi and Urdu. The script barrier is crossed by simple rule-based transliteration of the text of all languages to Devanagari. Experiments are performed on mBERT, IndicBERT, MuRIL and two RoBERTa-based LMs, the last two being pre-trained by us. Low resource languages, such as Oriya and Punjabi, are found to be the largest beneficiaries of multilingual fine-tuning. Textual Entailment, Entity Classification, Section Title Prediction, tasks of IndicGLUE and POS tagging form our test bed. Compared to monolingual fine tuning we get relative performance improvement of up to 150% in the downstream tasks. The surprise take-away is that for any language there is a particular combination of other languages which yields the best performance, and any additional language is in fact detrimental.

IITP-MT at CALCS2021: English to Hinglish Neural Machine Translation using Unsupervised Synthetic Code-Mixed Parallel Corpus
Ramakrishna Appicharla | Kamal Kumar Gupta | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching

This paper describes the system submitted by IITP-MT team to Computational Approaches to Linguistic Code-Switching (CALCS 2021) shared task on MT for English→Hinglish. We submit a neural machine translation (NMT) system which is trained on the synthetic code-mixed (cm) English-Hinglish parallel corpus. We propose an approach to create code-mixed parallel corpus from a clean parallel corpus in an unsupervised manner. It is an alignment based approach and we do not use any linguistic resources for explicitly marking any token for code-switching. We also train NMT model on the gold corpus provided by the workshop organizers augmented with the generated synthetic code-mixed parallel corpus. The model trained over the generated synthetic cm data achieves 10.09 BLEU points over the given test set.

SEPRG: Sentiment aware Emotion controlled Personalized Response Generation
Mauajama Firdaus | Umang Jain | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 14th International Conference on Natural Language Generation

Social chatbots have gained immense popularity, and their appeal lies not just in their capacity to respond to the diverse requests from users, but also in the ability to develop an emotional connection with users. To further develop and promote social chatbots, we need to concentrate on increasing user interaction and take into account both the intellectual and emotional quotient in the conversational agents. Therefore, in this work, we propose the task of sentiment aware emotion controlled personalized dialogue generation giving the machine the capability to respond emotionally and in accordance with the persona of the user. As sentiment and emotions are highly co-related, we use the sentiment knowledge of the previous utterance to generate the correct emotional response in accordance with the user persona. We design a Transformer based Dialogue Generation framework, that generates responses that are sensitive to the emotion of the user and corresponds to the persona and sentiment as well. Moreover, the persona information is encoded by a different Transformer encoder, along with the dialogue history, is fed to the decoder for generating responses. We annotate the PersonaChat dataset with sentiment information to improve the response quality. Experimental results on the PersonaChat dataset show that the proposed framework significantly outperforms the existing baselines, thereby generating personalized emotional responses in accordance with the sentiment that provides better emotional connection and user satisfaction as desired in a social chatbot.

Towards Sentiment and Emotion aided Multi-modal Speech Act Classification in Twitter
Tulika Saha | Apoorva Upadhyaya | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Speech Act Classification determining the communicative intent of an utterance has been investigated widely over the years as a standalone task. This holds true for discussion in any fora including social media platform such as Twitter. But the emotional state of the tweeter which has a considerable effect on the communication has not received the attention it deserves. Closely related to emotion is sentiment, and understanding of one helps understand the other. In this work, we firstly create a new multi-modal, emotion-TA (‘TA’ means tweet act, i.e., speech act in Twitter) dataset called EmoTA collected from open-source Twitter dataset. We propose a Dyadic Attention Mechanism (DAM) based multi-modal, adversarial multi-tasking framework. DAM incorporates intra-modal and inter-modal attention to fuse multiple modalities and learns generalized features across all the tasks. Experimental results indicate that the proposed framework boosts the performance of the primary task, i.e., TA classification (TAC) by benefitting from the two secondary tasks, i.e., Sentiment and Emotion Analysis compared to its uni-modal and single task TAC (tweet act classification) variants.

Invited Presentation
Pushpak Bhattacharyya
Proceedings of the 14th Workshop on Building and Using Comparable Corpora (BUCC 2021)

AI now and in future will have to grapple continuously with the problem of low resource. AI will increasingly be ML intensive. But ML needs data often with annotation. However, annotation is costly. Over the years, through work on multiple problems, we have developed insight into how to do language processing in low resource setting. Following 6 methods—individually and in combination—seem to be the way forward: 1) Artificially augment resource (e.g. subwords) 2) Cooperative NLP (e.g., pivot in MT) 3) Linguistic embellishment (e.g. factor based MT, source reordering) 4) Joint Modeling (e.g., Coref and NER, Sentiment and Emotion: each task helping the other to either boost accuracy or reduce resource requirement) 5) Multimodality (e.g., eye tracking based NLP, also picture+text+speech based Sentiment Analysis) 6)Cross Lingual Embedding (e.g., embedding from multiple languages helping MT, close to 2 above) The present talk will focus on low resource machine translation. We describe the use of techniques from the above list and bring home the seriousness and methodology of doing Machine Translation in low resource settings.

Introduction to ProverbNet: An Online Multilingual Database of Proverbs and Comprehensive Metadata
Shreyas Pimpalgaonkar | Dhanashree Lele | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Proverbs are unique linguistic expressions used by humans in the process of communication. They are frozen expressions and have the capacity to convey deep semantic aspects of a given language. This paper describes ProverbNet, a novel online multilingual database of proverbs and comprehensive metadata equipped with a multipurpose search engine to store, explore, understand, classify and analyze proverbs and their metadata. ProverbNet has immense applications including machine translation, cognitive studies and learning tools. We have 2320 Sanskrit Proverbs and 1136 Marathi proverbs and their metadata in ProverbNet and are adding more proverbs in different languages to the network.

IITP-MT at WAT2021: Indic-English Multilingual Neural Machine Translation using Romanized Vocabulary
Ramakrishna Appicharla | Kamal Kumar Gupta | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

This paper describes the systems submitted to WAT 2021 MultiIndicMT shared task by IITP-MT team. We submit two multilingual Neural Machine Translation (NMT) systems (Indic-to-English and English-to-Indic). We romanize all Indic data and create subword vocabulary which is shared between all Indic languages. We use back-translation approach to generate synthetic data which is appended to parallel corpus and used to train our models. The models are evaluated using BLEU, RIBES and AMFM scores with Indic-to-English model achieving 40.08 BLEU for Hindi-English pair and English-to-Indic model achieving 34.48 BLEU for English-Hindi pair. However, we observe that the shared romanized subword vocabulary is not helping English-to-Indic model at the time of generation, leading it to produce poor quality translations for Tamil, Telugu and Malayalam to English pairs with BLEU score of 8.51, 6.25 and 3.79 respectively.

Crosslingual Embeddings are Essential in UNMT for distant languages: An English to IndoAryan Case Study
Tamali Banerjee | Rudra V Murthy | Pushpak Bhattacharya
Proceedings of Machine Translation Summit XVIII: Research Track

Recent advances in Unsupervised Neural Machine Translation (UNMT) has minimized the gap between supervised and unsupervised machine translation performance for closely related language-pairs. However and the situation is very different for distant language pairs. Lack of overlap in lexicon and low syntactic similarity such as between English and IndoAryan languages leads to poor translation quality in existing UNMT systems. In this paper and we show that initialising the embedding layer of UNMT models with cross-lingual embeddings leads to significant BLEU score improvements over existing UNMT models where the embedding layer weights are randomly initialized. Further and freezing the embedding layer weights leads to better gains compared to updating the embedding layer weights during training. We experimented using Masked Sequence to Sequence (MASS) and Denoising Autoencoder (DAE) UNMT approaches for three distant language pairs. The proposed cross-lingual embedding initialization yields BLEU score improvement of as much as ten times over the baseline for English-Hindi and English-Bengali and English-Gujarati. Our analysis shows that initialising embedding layer with static cross-lingual embedding mapping is essential for training of UNMT models for distant language-pairs.

Wikipedia Current Events Summarization using Particle Swarm Optimization
Santosh Kumar Mishra | Darsh Kaushik | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

This paper proposes a method to summarize news events from multiple sources. We pose event summarization as a clustering-based optimization problem and solve it using particle swarm optimization. The proposed methodology uses the search capability of particle swarm optimization, detecting the number of clusters automatically. Experiments are conducted with the Wikipedia Current Events Portal dataset and evaluated using the well-known ROUGE-1, ROUGE-2, and ROUGE-L scores. The obtained results show the efficacy of the proposed methodology over the state-of-the-art methods. It attained improvement of 33.42%, 81.75%, and 57.58% in terms of ROUGE-1, ROUGE-2, and ROUGE-L, respectively.

2020

Recommendation Chart of Domains for Cross-Domain Sentiment Analysis: Findings of A 20 Domain Study
Akash Sheoran | Diptesh Kanojia | Aditya Joshi | Pushpak Bhattacharyya
Proceedings of the Twelfth Language Resources and Evaluation Conference

Cross-domain sentiment analysis (CDSA) helps to address the problem of data scarcity in scenarios where labelled data for a domain (known as the target domain) is unavailable or insufficient. However, the decision to choose a domain (known as the source domain) to leverage from is, at best, intuitive. In this paper, we investigate text similarity metrics to facilitate source domain selection for CDSA. We report results on 20 domains (all possible pairs) using 11 similarity metrics. Specifically, we compare CDSA performance with these metrics for different domain-pairs to enable the selection of a suitable source domain, given a target domain. These metrics include two novel metrics for evaluating domain adaptability to help source domain selection of labelled data and utilize word and sentence-based embeddings as metrics for unlabelled data. The goal of our experiments is a recommendation chart that gives the K best source domains for CDSA for a given target domain. We show that the best K source domains returned by our similarity metrics have a precision of over 50%, for varying values of K.

Filtering Back-Translated Data in Unsupervised Neural Machine Translation
Jyotsana Khatri | Pushpak Bhattacharyya
Proceedings of the 28th International Conference on Computational Linguistics

Unsupervised neural machine translation (NMT) utilizes only monolingual data for training. The quality of back-translated data plays an important role in the performance of NMT systems. In back-translation, all generated pseudo parallel sentence pairs are not of the same quality. Taking inspiration from domain adaptation where in-domain sentences are given more weight in training, in this paper we propose an approach to filter back-translated data as part of the training process of unsupervised NMT. Our approach gives more weight to good pseudo parallel sentence pairs in the back-translation phase. We calculate the weight of each pseudo parallel sentence pair using sentence-wise round-trip BLEU score which is normalized batch-wise. We compare our approach with the current state of the art approaches for unsupervised NMT.

A Retrofitting Model for Incorporating Semantic Relations into Word Embeddings
Sapan Shah | Sreedhar Reddy | Pushpak Bhattacharyya
Proceedings of the 28th International Conference on Computational Linguistics

We present a novel retrofitting model that can leverage relational knowledge available in a knowledge resource to improve word embeddings. The knowledge is captured in terms of relation inequality constraints that compare similarity of related and unrelated entities in the context of an anchor entity. These constraints are used as training data to learn a non-linear transformation function that maps original word vectors to a vector space respecting these constraints. The transformation function is learned in a similarity metric learning setting using Triplet network architecture. We applied our model to synonymy, antonymy and hypernymy relations in WordNet and observed large gains in performance over original distributional models as well as other retrofitting approaches on word similarity task and significant overall improvement on lexical entailment detection task.

Can Neural Networks Automatically Score Essay Traits?
Sandeep Mathias | Pushpak Bhattacharyya
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

Essay traits are attributes of an essay that can help explain how well written (or badly written) the essay is. Examples of traits include Content, Organization, Language, Sentence Fluency, Word Choice, etc. A lot of research in the last decade has dealt with automatic holistic essay scoring - where a machine rates an essay and gives a score for the essay. However, writers need feedback, especially if they want to improve their writing - which is why trait-scoring is important. In this paper, we show how a deep-learning based system can outperform feature-based machine learning systems, as well as a string kernel system in scoring essay traits.

Knowledge Graph and Deep Neural Network for Extractive Text Summarization by Utilizing Triples
Amit Vhatkar | Pushpak Bhattacharyya | Kavi Arya
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation

In our research work, we represent the content of the sentence in graphical form after extracting triples from the sentences. In this paper, we will discuss novel methods to generate an extractive summary by scoring the triples. Our work has also touched upon sequence-to-sequence encoding of the content of the sentence, to classify it as a summary or a non-summary sentence. Our findings help to decide the nature of the sentences forming the summary and the length of the system generated summary as compared to the length of the reference summary.

Semantic Extractor-Paraphraser based Abstractive Summarization
Anubhav Jangra | Raghav Jain | Vaibhav Mavi | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

The anthology of spoken languages today is inundated with textual information, necessitating the development of automatic summarization models. In this manuscript, we propose an extractor-paraphraser based abstractive summarization system that exploits semantic overlap as opposed to its predecessors that focus more on syntactic information overlap. Our model outperforms the state-of-the-art baselines in terms of ROUGE, METEOR and word mover similarity (WMS), establishing the superiority of the proposed system via extensive ablation experiments. We have also challenged the summarization capabilities of the state of the art Pointer Generator Network (PGN), and through thorough experimentation, shown that PGN is more of a paraphraser, contrary to the prevailing notion of a summarizer; illustrating it’s incapability to accumulate information across multiple sentences.

Cognitively Aided Zero-Shot Automatic Essay Grading
Sandeep Mathias | Rudra Murthy | Diptesh Kanojia | Pushpak Bhattacharyya
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

Automatic essay grading (AEG) is a process in which machines assign a grade to an essay written in response to a topic, called the prompt. Zero-shot AEG is when we train a system to grade essays written to a new prompt which was not present in our training data. In this paper, we describe a solution to the problem of zero-shot automatic essay grading, using cognitive information, in the form of gaze behaviour. Our experiments show that using gaze behaviour helps in improving the performance of AEG systems, especially when we provide a new essay written in response to a new prompt for scoring, by an average of almost 5 percentage points of QWK.

Happy Are Those Who Grade without Seeing: A Multi-Task Learning Approach to Grade Essays Using Gaze Behaviour
Sandeep Mathias | Rudra Murthy | Diptesh Kanojia | Abhijit Mishra | Pushpak Bhattacharyya
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

The gaze behaviour of a reader is helpful in solving several NLP tasks such as automatic essay grading. However, collecting gaze behaviour from readers is costly in terms of time and money. In this paper, we propose a way to improve automatic essay grading using gaze behaviour, which is learnt at run time using a multi-task learning framework. To demonstrate the efficacy of this multi-task learning based approach to automatic essay grading, we collect gaze behaviour for 48 essays across 4 essay sets, and learn gaze behaviour for the rest of the essays, numbering over 7000 essays. Using the learnt gaze behaviour, we can achieve a statistically significant improvement in performance over the state-of-the-art system for the essay sets where we have gaze data. We also achieve a statistically significant improvement for 4 other essay sets, numbering about 6000 essays, where we have no gaze behaviour data available. Our approach establishes that learning gaze behaviour improves automatic essay grading.

Incorporating Politeness across Languages in Customer Care Responses: Towards building a Multi-lingual Empathetic Dialogue Agent
Mauajama Firdaus | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Twelfth Language Resources and Evaluation Conference

Customer satisfaction is an essential aspect of customer care systems. It is imperative for such systems to be polite while handling customer requests/demands. In this paper, we present a large multi-lingual conversational dataset for English and Hindi. We choose data from Twitter having both generic and courteous responses between customer care agents and aggrieved users. We also propose strong baselines that can induce courteous behaviour in generic customer care response in a multi-lingual scenario. We build a deep learning framework that can simultaneously handle different languages and incorporate polite behaviour in the customer care agent’s responses. Our system is competent in generating responses in different languages (here, English and Hindi) depending on the customer’s preference and also is able to converse with humans in an empathetic manner to ensure customer satisfaction and retention. Experimental results show that our proposed models can converse in both the languages and the information shared between the languages helps in improving the performance of the overall system. Qualitative and quantitative analysis shows that the proposed method can converse in an empathetic manner by incorporating courteousness in the responses and hence increasing customer satisfaction.

ScholarlyRead: A New Dataset for Scientific Article Reading Comprehension
Tanik Saikh | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present ScholarlyRead, span-of-word-based scholarly articles’ Reading Comprehension (RC) dataset with approximately 10K manually checked passage-question-answer instances. ScholarlyRead was constructed in semi-automatic way. We consider the articles from two popular journals of a reputed publishing house. Firstly, we generate questions from these articles in an automatic way. Generated questions are then manually checked by the human annotators. We propose a baseline model based on Bi-Directional Attention Flow (BiDAF) network that yields the F1 score of 37.31%. The framework would be useful for building Question-Answering (QA) systems on scientific articles.

MEISD: A Multimodal Multi-Label Emotion, Intensity and Sentiment Dialogue Dataset for Emotion Recognition and Sentiment Analysis in Conversations
Mauajama Firdaus | Hardik Chauhan | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 28th International Conference on Computational Linguistics

Emotion and sentiment classification in dialogues is a challenging task that has gained popularity in recent times. Humans tend to have multiple emotions with varying intensities while expressing their thoughts and feelings. Emotions in an utterance of dialogue can either be independent or dependent on the previous utterances, thus making the task complex and interesting. Multi-label emotion detection in conversations is a significant task that provides the ability to the system to understand the various emotions of the users interacting. Sentiment analysis in dialogue/conversation, on the other hand, helps in understanding the perspective of the user with respect to the ongoing conversation. Along with text, additional information in the form of audio and video assist in identifying the correct emotions with the appropriate intensity and sentiments in an utterance of a dialogue. Lately, quite a few datasets have been made available for dialogue emotion and sentiment classification, but these datasets are imbalanced in representing different emotions and consist of an only single emotion. Hence, we present at first a large-scale balanced Multimodal Multi-label Emotion, Intensity, and Sentiment Dialogue dataset (MEISD), collected from different TV series that has textual, audio and visual features, and then establish a baseline setup for further research.

Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages
Diptesh Kanojia | Raj Dabre | Shubham Dewangan | Pushpak Bhattacharyya | Gholamreza Haffari | Malhar Kulkarni
Proceedings of the 28th International Conference on Computational Linguistics

Cognates are variants of the same lexical form across different languages; for example “fonema” in Spanish and “phoneme” in English are cognates, both of which mean “a unit of sound”. The task of automatic detection of cognates among any two languages can help downstream NLP tasks such as Cross-lingual Information Retrieval, Computational Phylogenetics, and Machine Translation. In this paper, we demonstrate the use of cross-lingual word embeddings for detecting cognates among fourteen Indian Languages. Our approach introduces the use of context from a knowledge graph to generate improved feature representations for cognate detection. We, then, evaluate the impact of our cognate detection mechanism on neural machine translation (NMT), as a downstream task. We evaluate our methods to detect cognates on a challenging dataset of twelve Indian languages, namely, Sanskrit, Hindi, Assamese, Oriya, Kannada, Gujarati, Tamil, Telugu, Punjabi, Bengali, Marathi, and Malayalam. Additionally, we create evaluation datasets for two more Indian languages, Konkani and Nepali. We observe an improvement of up to 18% points, in terms of F-score, for cognate detection. Furthermore, we observe that cognates extracted using our method help improve NMT quality by up to 2.76 BLEU. We also release our code, newly constructed datasets and cross-lingual models publicly.

IITP-AINLPML at SemEval-2020 Task 12: Offensive Tweet Identification and Target Categorization in a Multitask Environment
Soumitra Ghosh | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper, we describe the participation of IITP-AINLPML team in the SemEval-2020 SharedTask 12 on Offensive Language Identification and Target Categorization in English Twitter data. Our proposed model learns to extract textual features using a BiGRU-based deep neural network supported by a Hierarchical Attention architecture to focus on the most relevant areas in the text. We leverage the effectiveness of multitask learning while building our models for sub-task A and B. We do necessary undersampling of the over-represented classes in the sub-tasks A and C.During training, we consider a threshold of 0.5 as the separation margin between the instances belonging to classes OFF and NOT in sub-task A and UNT and TIN in sub-task B. For sub-task C, the class corresponding to the maximum score among the given confidence scores of the classes(IND, GRP and OTH) is considered as the final label for an instance. Our proposed model obtains the macro F1-scores of 90.95%, 55.69% and 63.88% in sub-task A, B and C, respectively.

Reinforced Multi-task Approach for Multi-hop Question Generation
Deepak Gupta | Hardik Chauhan | Ravi Tej Akella | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 28th International Conference on Computational Linguistics

Question generation (QG) attempts to solve the inverse of question answering (QA) problem by generating a natural language question given a document and an answer. While sequence to sequence neural models surpass rule-based systems for QG, they are limited in their capacity to focus on more than one supporting fact. For QG, we often require multiple supporting facts to generate high-quality questions. Inspired by recent works on multi-hop reasoning in QA, we take up Multi-hop question generation, which aims at generating relevant questions based on supporting facts in the context. We employ multitask learning with the auxiliary task of answer-aware supporting fact prediction to guide the question generator. In addition, we also proposed a question-aware reward function in a Reinforcement Learning (RL) framework to maximize the utilization of the supporting facts. We demonstrate the effectiveness of our approach through experiments on the multi-hop question answering dataset, HotPotQA. Empirical evaluation shows our model to outperform the single-hop neural question generation models on both automatic evaluation metrics such as BLEU, METEOR, and ROUGE and human evaluation metrics for quality and coverage of the generated questions.

Annotated Corpus of Tweets in English from Various Domains for Emotion Detection
Soumitra Ghosh | Asif Ekbal | Pushpak Bhattacharyya | Sriparna Saha | Vipin Tyagi | Alka Kumar | Shikha Srivastava | Nitish Kumar
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

Emotion recognition is a very well-attended problem in Natural Language Processing (NLP). Most of the existing works on emotion recognition focus on the general domain and in some cases to specific domains like fairy tales, blogs, weather, Twitter etc. But emotion analysis systems in the domains of security, social issues, technology, politics, sports, etc. are very rare. In this paper, we create a benchmark setup for emotion recognition in these specialised domains. First, we construct a corpus of 18,921 tweets in English annotated with Paul Ekman’s six basic emotions (Anger, Disgust, Fear, Happiness, Sadness, Surprise) and a non-emotive class Others. Thereafter, we propose a deep neural framework to perform emotion recognition in an end-to-end setting. We build various models based on Convolutional Neural Network (CNN), Bi-directional Long Short Term Memory (Bi-LSTM), Bi-directional Gated Recurrent Unit (Bi-GRU). We propose a Hierarchical Attention-based deep neural network for Emotion Detection (HAtED). We also develop multiple systems by considering different sets of emotion classes for each system and report the detailed comparative analysis of the results. Experiments show the hierarchical attention-based model achieves best results among the considered baselines with accuracy of 69%.

D-Coref: A Fast and Lightweight Coreference Resolution Model using DistilBERT
Chanchal Suman | Jeetu Kumar | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

Smart devices are often deployed in some edge-devices, which require quality solutions in limited amount of memory usage. In most of the user-interaction based smart devices, coreference resolution is often required. Keeping this in view, we have developed a fast and lightweight coreference resolution model which meets the minimum memory requirement and converges faster. In order to generate the embeddings for solving the task of coreference resolution, DistilBERT, a light weight BERT module is utilized. DistilBERT consumes less memory (only 60% of memory in comparison to BERT-based heavy model) and it is suitable for deployment in edge devices. DistilBERT embedding helps in 60% faster convergence with an accuracy compromise of 2.59%, and 6.49% with respect to its base model and current state-of-the-art, respectively.

Sentiment and Emotion help Sarcasm? A Multi-task Learning Framework for Multi-Modal Sarcasm, Sentiment and Emotion Analysis
Dushyant Singh Chauhan | Dhanush S R | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In this paper, we hypothesize that sarcasm is closely related to sentiment and emotion, and thereby propose a multi-task deep learning framework to solve all these three problems simultaneously in a multi-modal conversational scenario. We, at first, manually annotate the recently released multi-modal MUStARD sarcasm dataset with sentiment and emotion classes, both implicit and explicit. For multi-tasking, we propose two attention mechanisms, viz. Inter-segment Inter-modal Attention (Ie-Attention) and Intra-segment Inter-modal Attention (Ia-Attention). The main motivation of Ie-Attention is to learn the relationship between the different segments of the sentence across the modalities. In contrast, Ia-Attention focuses within the same segment of the sentence across the modalities. Finally, representations from both the attentions are concatenated and shared across the five classes (i.e., sarcasm, implicit sentiment, explicit sentiment, implicit emotion, explicit emotion) for multi-tasking. Experimental results on the extended version of the MUStARD dataset show the efficacy of our proposed approach for sarcasm detection over the existing state-of-the-art systems. The evaluation also shows that the proposed multi-task framework yields better performance for the primary task, i.e., sarcasm detection, with the help of two secondary tasks, emotion and sentiment analysis.

Generating Fluent Translations from Disfluent Text Without Access to Fluent References: IIT Bombay@IWSLT2020
Nikhil Saini | Jyotsana Khatri | Preethi Jyothi | Pushpak Bhattacharyya
Proceedings of the 17th International Conference on Spoken Language Translation

Machine translation systems perform reasonably well when the input is well-formed speech or text. Conversational speech is spontaneous and inherently consists of many disfluencies. Producing fluent translations of disfluent source text would typically require parallel disfluent to fluent training data. However, fluent translations of spontaneous speech are an additional resource that is tedious to obtain. This work describes the submission of IIT Bombay to the Conversational Speech Translation challenge at IWSLT 2020. We specifically tackle the problem of disfluency removal in disfluent-to-fluent text-to-text translation assuming no access to fluent references during training. Common patterns of disfluency are extracted from disfluent references and a noise induction model is used to simulate them starting from a clean monolingual corpus. This synthetically constructed dataset is then considered as a proxy for labeled data during training. We also make use of additional fluent text in the target language to help generate fluent translations. This work uses no fluent references during training and beats a baseline model by a margin of 4.21 and 3.11 BLEU points where the baseline uses disfluent and fluent references, respectively. Index Terms- disfluency removal, machine translation, noise induction, leveraging monolingual data, denoising for disfluency removal.

A Platform for Event Extraction in Hindi
Sovan Kumar Sahoo | Saumajit Saha | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Twelfth Language Resources and Evaluation Conference

Event Extraction is an important task in the widespread field of Natural Language Processing (NLP). Though this task is adequately addressed in English with sufficient resources, we are unaware of any benchmark setup in Indian languages. Hindi is one of the most widely spoken languages in the world. In this paper, we present an Event Extraction framework for Hindi language by creating an annotated resource for benchmarking, and then developing deep learning based models to set as the baselines. We crawl more than seventeen hundred disaster related Hindi news articles from the various news sources. We also develop deep learning based models for Event Trigger Detection and Classification, Argument Detection and Classification and Event-Argument Linking.

EL-BERT at SemEval-2020 Task 10: A Multi-Embedding Ensemble Based Approach for Emphasis Selection in Visual Media
Chandresh Kanani | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In visual media, text emphasis is the strengthening of words in a text to convey the intent of the author. Text emphasis in visual media is generally done by using different colors, backgrounds, or font for the text; it helps in conveying the actual meaning of the message to the readers. Emphasis selection is the task of choosing candidate words for emphasis, it helps in automatically designing posters and other media contents with written text. If we consider only the text and do not know the intent, then there can be multiple valid emphasis selections. We propose the use of ensembles for emphasis selection to improve over single emphasis selection models. We show that the use of multi-embedding helps in enhancing the results for base models. To show the efficacy of proposed approach we have also done a comparison of our results with state-of-the-art models.

CEASE, a Corpus of Emotion Annotated Suicide notes in English
Soumitra Ghosh | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Twelfth Language Resources and Evaluation Conference

A suicide note is usually written shortly before the suicide and it provides a chance to comprehend the self-destructive state of mind of the deceased. From a psychological point of view, suicide notes have been utilized for recognizing the motive behind the suicide. To the best of our knowledge, there is no openly accessible suicide note corpus at present, making it challenging for the researchers and developers to deep dive into the area of mental health assessment and suicide prevention. In this paper, we create a fine-grained emotion annotated corpus (CEASE) of suicide notes in English and develop various deep learning models to perform emotion detection on the curated dataset. The corpus consists of 2393 sentences from around 205 suicide notes collected from various sources. Each sentence is annotated with a particular emotion class from a set of 15 fine-grained emotion labels, namely (forgiveness, happiness_peacefulness, love, pride, hopefulness, thankfulness, blame, anger, fear, abuse, sorrow, hopelessness, guilt, information, instructions). For the evaluation, we develop an ensemble architecture, where the base models correspond to three supervised deep learning models, namely Convolutional Neural Network (CNN), Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM). We obtain the highest test accuracy of 60.17% and cross-validation accuracy of 60.32%

Multi-domain Tweet Corpora for Sentiment Analysis: Resource Creation and Evaluation
Mamta | Asif Ekbal | Pushpak Bhattacharyya | Shikha Srivastava | Alka Kumar | Tista Saha
Proceedings of the Twelfth Language Resources and Evaluation Conference

Due to the phenomenal growth of online content in recent time, sentiment analysis has attracted attention of the researchers and developers. A number of benchmark annotated corpora are available for domains like movie reviews, product reviews, hotel reviews, etc. The pervasiveness of social media has also lead to a huge amount of content posted by users who are misusing the power of social media to spread false beliefs and to negatively influence others. This type of content is coming from the domains like terrorism, cybersecurity, technology, social issues, etc. Mining of opinions from these domains is important to create a socially intelligent system to provide security to the public and to maintain the law and order situations. To the best of our knowledge, there is no publicly available tweet corpora for such pervasive domains. Hence, we firstly create a multi-domain tweet sentiment corpora and then establish a deep neural network based baseline framework to address the above mentioned issues. Annotated corpus has Cohen’s Kappa measurement for annotation quality of 0.770, which shows that the data is of acceptable quality. We are able to achieve 84.65% accuracy for sentiment analysis by using an ensemble of Convolutional Neural Network (CNN), Long Short Term Memory (LSTM), and Gated Recurrent Unit(GRU).

Looking inside Noun Compounds: Unsupervised Prepositional and Free Paraphrasing
Girishkumar Ponkiya | Rudra Murthy | Pushpak Bhattacharyya | Girish Palshikar
Findings of the Association for Computational Linguistics: EMNLP 2020

A noun compound is a sequence of contiguous nouns that acts as a single noun, although the predicate denoting the semantic relation between its components is dropped. Noun Compound Interpretation is the task of uncovering the relation, in the form of a preposition or a free paraphrase. Prepositional paraphrasing refers to the use of preposition to explain the semantic relation, whereas free paraphrasing refers to invoking an appropriate predicate denoting the semantic relation. In this paper, we propose an unsupervised methodology for these two types of paraphrasing. We use pre-trained contextualized language models to uncover the ‘missing’ words (preposition or predicate). These language models are usually trained to uncover the missing word/words in a given input sentence. Our approach uses templates to prepare the input sequence for the language model. The template uses a special token to indicate the missing predicate. As the model has already been pre-trained to uncover a missing word (or a sequence of words), we exploit it to predict missing words for the input sequence. Our experiments using four datasets show that our unsupervised approach (a) performs comparably to supervised approaches for prepositional paraphrasing, and (b) outperforms supervised approaches for free paraphrasing. Paraphrasing (prepositional or free) using our unsupervised approach is potentially helpful for NLP tasks like machine translation and information extraction.

IIITBH-IITP@CL-SciSumm20, CL-LaySumm20, LongSumm20
Saichethan Reddy | Naveen Saini | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the First Workshop on Scholarly Document Processing

In this paper, we present the IIIT Bhagalpur and IIT Patna team’s effort to solve the three shared tasks namely, CL-SciSumm 2020, CL-LaySumm 2020, LongSumm 2020 at SDP 2020. The theme of these tasks is to generate medium-scale, lay and long summaries, respectively, for scientific articles. For the first two tasks, unsupervised systems are developed, while for the third one, we develop a supervised system. The performances of all the systems were evaluated on the associated datasets with the shared tasks in term of well-known ROUGE metric.

A Unified Framework for Multilingual and Code-Mixed Visual Question Answering
Deepak Gupta | Pabitra Lenka | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

In this paper, we propose an effective deep learning framework for multilingual and code- mixed visual question answering. The pro- posed model is capable of predicting answers from the questions in Hindi, English or Code- mixed (Hinglish: Hindi-English) languages. The majority of the existing techniques on Vi- sual Question Answering (VQA) focus on En- glish questions only. However, many applica- tions such as medical imaging, tourism, visual assistants require a multilinguality-enabled module for their widespread usages. As there is no available dataset in English-Hindi VQA, we firstly create Hindi and Code-mixed VQA datasets by exploiting the linguistic properties of these languages. We propose a robust tech- nique capable of handling the multilingual and code-mixed question to provide the answer against the visual information (image). To better encode the multilingual and code-mixed questions, we introduce a hierarchy of shared layers. We control the behaviour of these shared layers by an attention-based soft layer sharing mechanism, which learns how shared layers are applied in different ways for the dif- ferent languages of the question. Further, our model uses bi-linear attention with a residual connection to fuse the language and image fea- tures. We perform extensive evaluation and ablation studies for English, Hindi and Code- mixed VQA. The evaluation shows that the proposed multilingual model achieves state-of- the-art performance in all these settings.

IITP-AI-NLP-ML@ CL-SciSumm 2020, CL-LaySumm 2020, LongSumm 2020
Santosh Kumar Mishra | Harshavardhan Kundarapu | Naveen Saini | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the First Workshop on Scholarly Document Processing

The publication rate of scientific literature increases rapidly, which poses a challenge for researchers to keep themselves updated with new state-of-the-art. Scientific document summarization solves this problem by summarizing the essential fact and findings of the document. In the current paper, we present the participation of IITP-AI-NLP-ML team in three shared tasks, namely, CL-SciSumm 2020, LaySumm 2020, LongSumm 2020, which aims to generate medium, lay, and long summaries of the scientific articles, respectively. To solve CL-SciSumm 2020 and LongSumm 2020 tasks, three well-known clustering techniques are used, and then various sentence scoring functions, including textual entailment, are used to extract the sentences from each cluster for a summary generation. For LaySumm 2020, an encoder-decoder based deep learning model has been utilized. Performances of our developed systems are evaluated in terms of ROUGE measures on the associated datasets with the shared task.

Modelling Source- and Target- Language Syntactic Information as Conditional Context in Interactive Neural Machine Translation
Kamal Kumar Gupta | Rejwanul Haque | Asif Ekbal | Pushpak Bhattacharyya | Andy Way
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

In interactive machine translation (MT), human translators correct errors in automatic translations in collaboration with the MT systems, which is seen as an effective way to improve the productivity gain in translation. In this study, we model source-language syntactic constituency parse and target-language syntactic descriptions in the form of supertags as conditional context for interactive prediction in neural MT (NMT). We found that the supertags significantly improve productivity gain in translation in interactive-predictive NMT (INMT), while syntactic parsing somewhat found to be effective in reducing human effort in translation. Furthermore, when we model this source- and target-language syntactic information together as the conditional context, both types complement each other and our fully syntax-informed INMT model statistically significantly reduces human efforts in a French–to–English translation task, achieving 4.30 points absolute (corresponding to 9.18% relative) improvement in terms of word prediction accuracy (WPA) and 4.84 points absolute (corresponding to 9.01% relative) reduction in terms of word stroke ratio (WSR) over the baseline.

Part-of-Speech Annotation Challenges in Marathi
Gajanan Rane | Nilesh Joshi | Geetanjali Rane | Hanumant Redkar | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation

Part of Speech (POS) annotation is a significant challenge in natural language processing. The paper discusses issues and challenges faced in the process of POS annotation of the Marathi data from four domains viz., tourism, health, entertainment and agriculture. During POS annotation, a lot of issues were encountered. Some of the major ones are discussed in detail in this paper. Also, the two approaches viz., the lexical (L approach) and the functional (F approach) of POS tagging have been discussed and presented with examples. Further, some ambiguous cases in POS annotation are presented in the paper.

Extracting Message Sequence Charts from Hindi Narrative Text
Swapnil Hingmire | Nitin Ramrakhiyani | Avinash Kumar Singh | Sangameshwar Patil | Girish Palshikar | Pushpak Bhattacharyya | Vasudeva Varma
Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events

In this paper, we propose the use of Message Sequence Charts (MSC) as a representation for visualizing narrative text in Hindi. An MSC is a formal representation allowing the depiction of actors and interactions among these actors in a scenario, apart from supporting a rich framework for formal inference. We propose an approach to extract MSC actors and interactions from a Hindi narrative. As a part of the approach, we enrich an existing event annotation scheme where we provide guidelines for annotation of the mood of events (realis vs irrealis) and guidelines for annotation of event arguments. We report performance on multiple evaluation criteria by experimenting with Hindi narratives from Indian History. Though Hindi is the fourth most-spoken first language in the world, from the NLP perspective it has comparatively lesser resources than English. Moreover, there is relatively less work in the context of event processing in Hindi. Hence, we believe that this work is among the initial works for Hindi event processing.

Unsupervised Aspect-Level Sentiment Controllable Style Transfer
Mukuntha Narayanan Sundararaman | Zishan Ahmad | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Unsupervised style transfer in text has previously been explored through the sentiment transfer task. The task entails inverting the overall sentiment polarity in a given input sentence, while preserving its content. From the Aspect-Based Sentiment Analysis (ABSA) task, we know that multiple sentiment polarities can often be present together in a sentence with multiple aspects. In this paper, the task of aspect-level sentiment controllable style transfer is introduced, where each of the aspect-level sentiments can individually be controlled at the output. To achieve this goal, a BERT-based encoder-decoder architecture with saliency weighted polarity injection is proposed, with unsupervised training strategies, such as ABSA masked-language-modelling. Through both automatic and manual evaluation, we show that the system is successful in controlling aspect-level sentiments.

A Semi-supervised Approach to Generate the Code-Mixed Text using Pre-trained Encoder and Transfer Learning
Deepak Gupta | Asif Ekbal | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: EMNLP 2020

Code-mixing, the interleaving of two or more languages within a sentence or discourse is ubiquitous in multilingual societies. The lack of code-mixed training data is one of the major concerns for the development of end-to-end neural network-based models to be deployed for a variety of natural language processing (NLP) applications. A potential solution is to either manually create or crowd-source the code-mixed labelled data for the task at hand, but that requires much human efforts and often not feasible because of the language specific diversity in the code-mixed text. To circumvent the data scarcity issue, we propose an effective deep learning approach for automatically generating the code-mixed text from English to multiple languages without any parallel data. In order to train the neural network, we create synthetic code-mixed texts from the available parallel corpus by modelling various linguistic properties of code-mixing. Our codemixed text generator is built upon the encoder-decoder framework, where the encoder is augmented with the linguistic and task-agnostic features obtained from the transformer based language model. We also transfer the knowledge from a neural machine translation (NMT) to warm-start the training of code-mixed generator. Experimental results and in-depth analysis show the effectiveness of our proposed code-mixed text generation on eight diverse language pairs.

Leveraging Alignment and Phonology for low-resource Indic to English Neural Machine Transliteration
Parth Patel | Manthan Mehta | Pushpak Bhattacharya | Arjun Atreya
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

In this paper we present a novel transliteration technique based on Orthographic Syllable(OS) segmentation for low-resource Indian languages (ILs). Given that alignment has produced promising results in Statistical Machine Transliteration systems and phonology plays an important role in transliteration, we introduce a new model which uses alignment representation similar to that of IBM model 3 to pre-process the tokenized input sequence and then use pre-trained source and target OS-embeddings for training. We apply our model for transliteration from ILs to English and report our accuracy based on Top-1 Exact Match. We also compare our accuracy with a previously proposed Phrase-Based model and report improvements.

Challenge Dataset of Cognates and False Friend Pairs from Indian Languages
Diptesh Kanojia | Malhar Kulkarni | Pushpak Bhattacharyya | Gholamreza Haffari
Proceedings of the Twelfth Language Resources and Evaluation Conference

Cognates are present in multiple variants of the same text across different languages (e.g., “hund” in German and “hound” in the English language mean “dog”). They pose a challenge to various Natural Language Processing (NLP) applications such as Machine Translation, Cross-lingual Sense Disambiguation, Computational Phylogenetics, and Information Retrieval. A possible solution to address this challenge is to identify cognates across language pairs. In this paper, we describe the creation of two cognate datasets for twelve Indian languages namely Sanskrit, Hindi, Assamese, Oriya, Kannada, Gujarati, Tamil, Telugu, Punjabi, Bengali, Marathi, and Malayalam. We digitize the cognate data from an Indian language cognate dictionary and utilize linked Indian language Wordnets to generate cognate sets. Additionally, we use the Wordnet data to create a False Friends’ dataset for eleven language pairs. We also evaluate the efficacy of our dataset using previously available baseline cognate detection approaches. We also perform a manual evaluation with the help of lexicographers and release the curated gold-standard dataset with this paper.

Analysing cross-lingual transfer in lemmatisation for Indian languages
Kumar Saurav | Kumar Saunack | Pushpak Bhattacharyya
Proceedings of the 28th International Conference on Computational Linguistics

Lemmatization aims to reduce the sparse data problem by relating the inflected forms of a word to its dictionary form. However, most of the prior work on this topic has focused on high resource languages. In this paper, we evaluate cross-lingual approaches for low resource languages, especially in the context of morphologically rich Indian languages. We test our model on six languages from two different families and develop linguistic insights into each model’s performance.

Proceedings of the 7th Workshop on Asian Translation
Toshiaki Nakazawa | Hideki Nakayama | Chenchen Ding | Raj Dabre | Anoop Kunchukuttan | Win Pa Pa | Ondřej Bojar | Shantipriya Parida | Isao Goto | Hidaya Mino | Hiroshi Manabe | Katsuhito Sudoh | Sadao Kurohashi | Pushpak Bhattacharyya
Proceedings of the 7th Workshop on Asian Translation

“A Passage to India”: Pre-trained Word Embeddings for Indian Languages
Kumar Saurav | Kumar Saunack | Diptesh Kanojia | Pushpak Bhattacharyya
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

Dense word vectors or ‘word embeddings’ which encode semantic properties of words, have now become integral to NLP tasks like Machine Translation (MT), Question Answering (QA), Word Sense Disambiguation (WSD), and Information Retrieval (IR). In this paper, we use various existing approaches to create multiple word embeddings for 14 Indian languages. We place these embeddings for all these languages, viz., Assamese, Bengali, Gujarati, Hindi, Kannada, Konkani, Malayalam, Marathi, Nepali, Odiya, Punjabi, Sanskrit, Tamil, and Telugu in a single repository. Relatively newer approaches that emphasize catering to context (BERT, ELMo, etc.) have shown significant improvements, but require a large amount of resources to generate usable models. We release pre-trained embeddings generated using both contextual and non-contextual approaches. We also use MUSE and XLM to train cross-lingual embeddings for all pairs of the aforementioned languages. To show the efficacy of our embeddings, we evaluate our embedding models on XPOS, UPOS and NER tasks for all these languages. We release a total of 436 models using 8 different approaches. We hope they are useful for the resource-constrained Indian language NLP. The title of this paper refers to the famous novel “A Passage to India” by E.M. Forster, published initially in 1924.

Proceedings of the 17th International Conference on Natural Language Processing (ICON)
Pushpak Bhattacharyya | Dipti Misra Sharma | Rajeev Sangal
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

All-in-One: A Deep Attentive Multi-task Learning Framework for Humour, Sarcasm, Offensive, Motivation, and Sentiment on Memes
Dushyant Singh Chauhan | Dhanush S R | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

In this paper, we aim at learning the relationships and similarities of a variety of tasks, such as humour detection, sarcasm detection, offensive content detection, motivational content detection and sentiment analysis on a somewhat complicated form of information, i.e., memes. We propose a multi-task, multi-modal deep learning framework to solve multiple tasks simultaneously. For multi-tasking, we propose two attention-like mechanisms viz., Inter-task Relationship Module (iTRM) and Inter-class Relationship Module (iCRM). The main motivation of iTRM is to learn the relationship between the tasks to realize how they help each other. In contrast, iCRM develops relations between the different classes of tasks. Finally, representations from both the attentions are concatenated and shared across the five tasks (i.e., humour, sarcasm, offensive, motivational, and sentiment) for multi-tasking. We use the recently released dataset in the Memotion Analysis task @ SemEval 2020, which consists of memes annotated for the classes as mentioned above. Empirical results on Memotion dataset show the efficacy of our proposed approach over the existing state-of-the-art systems (Baseline and SemEval 2020 winner). The evaluation also indicates that the proposed multi-task framework yields better performance over the single-task learning.

A Multi-modal Personality Prediction System
Chanchal Suman | Aditya Gupta | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

Automatic prediction of personality traits has many real-life applications, e.g., in forensics, recommender systems, personalized services etc.. In this work, we have proposed a solution framework for solving the problem of predicting the personality traits of a user from videos. Ambient, facial and the audio features are extracted from the video of the user. These features are used for the final output prediction. The visual and audio modalities are combined in two different ways: averaging of predictions obtained from the individual modalities, and concatenation of features in multi-modal setting. The dataset released in Chalearn-16 is used for evaluating the performance of the system. Experimental results illustrate that it is possible to obtain better performance with a hand full of images, rather than using all the images present in the video

Towards Emotion-aided Multi-modal Dialogue Act Classification
Tulika Saha | Aditya Patra | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The task of Dialogue Act Classification (DAC) that purports to capture communicative intent has been studied extensively. But these studies limit themselves to text. Non-verbal features (change of tone, facial expressions etc.) can provide cues to identify DAs, thus stressing the benefit of incorporating multi-modal inputs in the task. Also, the emotional state of the speaker has a substantial effect on the choice of the dialogue act, since conversations are often influenced by emotions. Hence, the effect of emotion too on automatic identification of DAs needs to be studied. In this work, we address the role of both multi-modality and emotion recognition (ER) in DAC. DAC and ER help each other by way of multi-task learning. One of the major contributions of this work is a new dataset- multimodal Emotion aware Dialogue Act dataset called EMOTyDA, collected from open-sourced dialogue datasets. To demonstrate the utility of EMOTyDA, we build an attention based (self, inter-modal, inter-task) multi-modal, multi-task Deep Neural Network (DNN) for joint learning of DAs and emotions. We show empirically that multi-modality and multi-tasking achieve better performance of DAC compared to uni-modal and single task DAC variants.

2019

Utilizing Monolingual Data in NMT for Similar Languages: Submission to Similar Language Translation Task
Jyotsana Khatri | Pushpak Bhattacharyya
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

This paper describes our submission to Shared Task on Similar Language Translation in Fourth Conference on Machine Translation (WMT 2019). We submitted three systems for Hindi -> Nepali direction in which we have examined the performance of a RNN based NMT system, a semi-supervised NMT system where monolingual data of both languages is utilized using the architecture by and a system trained with extra synthetic sentences generated using copy of source and target sentences without using any additional monolingual data.

Multi-linguality helps: Event-Argument Extraction for Disaster Domain in Cross-lingual and Multi-lingual setting
Zishan Ahmad | Deeksha Varshney | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 16th International Conference on Natural Language Processing

Automatic extraction of disaster-related events and their arguments from natural language text is vital for building a decision support system for crisis management. Event extraction from various news sources is a well-explored area for this objective. However, extracting events alone, without any context, provides only partial help for this purpose. Extracting related arguments like Time, Place, Casualties, etc., provides a complete picture of the disaster event. In this paper, we create a disaster domain dataset in Hindi by annotating disaster-related event and arguments. We also obtain equivalent datasets for Bengali and English from a collaboration. We build a multi-lingual deep learning model for argument extraction in all the three languages. We also compare our multi-lingual system with a similar baseline mono-lingual system trained for each language separately. It is observed that a single multi-lingual system is able to compensate for lack of training data, by using joint training of dataset from different languages in shared space, thus giving a better overall result.

DeepSentiPeer: Harnessing Sentiment in Review Texts to Recommend Peer Review Decisions
Tirthankar Ghosal | Rajeev Verma | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Automatically validating a research artefact is one of the frontiers in Artificial Intelligence (AI) that directly brings it close to competing with human intellect and intuition. Although criticised sometimes, the existing peer review system still stands as the benchmark of research validation. The present-day peer review process is not straightforward and demands profound domain knowledge, expertise, and intelligence of human reviewer(s), which is somewhat elusive with the current state of AI. However, the peer review texts, which contains rich sentiment information of the reviewer, reflecting his/her overall attitude towards the research in the paper, could be a valuable entity to predict the acceptance or rejection of the manuscript under consideration. Here in this work, we investigate the role of reviewer sentiment embedded within peer review texts to predict the peer review outcome. Our proposed deep neural architecture takes into account three channels of information: the paper, the corresponding reviews, and review’s polarity to predict the overall recommendation score as well as the final decision. We achieve significant performance improvement over the baselines (∼ 29% error reduction) proposed in a recently released dataset of peer reviews. An AI of this kind could assist the editors/program chairs as an additional layer of confidence, especially when non-responding/missing reviewers are frequent in present day peer review.

An Introduction to the Textual History Tool
Diptesh Kanojia | Malhar Kulkarni | Pushpak Bhattacharyya | Eivind Kahrs
Proceedings of the 6th International Sanskrit Computational Linguistics Symposium

Parallel Corpus Filtering Based on Fuzzy String Matching
Sukanta Sen | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

In this paper, we describe the IIT Patna’s submission to WMT 2019 shared task on parallel corpus filtering. This shared task asks the participants to develop methods for scoring each parallel sentence from a given noisy parallel corpus. Quality of the scoring method is judged based on the quality of SMT and NMT systems trained on smaller set of high-quality parallel sentences sub-sampled from the original noisy corpus. This task has two language pairs. We submit for both the Nepali-English and Sinhala-English language pairs. We define fuzzy string matching score between English and the translated (into English) source based on Levenshtein distance. Based on the scores, we sub-sample two sets (having 1 million and 5 millions English tokens) of parallel sentences from each parallel corpus, and train SMT systems for development purpose only. The organizers publish the official evaluation using both SMT and NMT on the final official test set. Total 10 teams participated in the shared task and according the official evaluation, our scoring method obtains 2nd position in the team ranking for 1-million NepaliEnglish NMT and 5-million Sinhala-English NMT categories.

Introduction to Sanskrit Shabdamitra: An Educational Application of Sanskrit Wordnet
Malhar Kulkarni | Nilesh Joshi | Sayali Khare | Hanumant Redkar | Pushpak Bhattacharyya
Proceedings of the 6th International Sanskrit Computational Linguistics Symposium

Multilingual Unsupervised NMT using Shared Encoder and Language-Specific Decoders
Sukanta Sen | Kamal Kumar Gupta | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this paper, we propose a multilingual unsupervised NMT scheme which jointly trains multiple languages with a shared encoder and multiple decoders. Our approach is based on denoising autoencoding of each language and back-translating between English and multiple non-English languages. This results in a universal encoder which can encode any language participating in training into an inter-lingual representation, and language-specific decoders. Our experiments using only monolingual corpora show that multilingual unsupervised model performs better than the separately trained bilingual models achieving improvement of up to 1.48 BLEU points on WMT test sets. We also observe that even if we do not train the network for all possible translation directions, the network is still able to translate in a many-to-many fashion leveraging encoder’s ability to generate interlingual representation.

A Unified Multi-task Adversarial Learning Framework for Pharmacovigilance Mining
Shweta Yadav | Asif Ekbal | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The mining of adverse drug reaction (ADR) has a crucial role in the pharmacovigilance. The traditional ways of identifying ADR are reliable but time-consuming, non-scalable and offer a very limited amount of ADR relevant information. With the unprecedented growth of information sources in the forms of social media texts (Twitter, Blogs, Reviews etc.), biomedical literature, and Electronic Medical Records (EMR), it has become crucial to extract the most pertinent ADR related information from these free-form texts. In this paper, we propose a neural network inspired multi- task learning framework that can simultaneously extract ADRs from various sources. We adopt a novel adversarial learning-based approach to learn features across multiple ADR information sources. Unlike the other existing techniques, our approach is capable to extracting fine-grained information (such as ‘Indications’, ‘Symptoms’, ‘Finding’, ‘Disease’, ‘Drug’) which provide important cues in pharmacovigilance. We evaluate our proposed approach on three publicly available real- world benchmark pharmacovigilance datasets, a Twitter dataset from PSB 2016 Social Me- dia Shared Task, CADEC corpus and Medline ADR corpus. Experiments show that our unified framework achieves state-of-the-art performance on individual tasks associated with the different benchmark datasets. This establishes the fact that our proposed approach is generic, which enables it to achieve high performance on the diverse datasets.

Utilizing Word Embeddings based Features for Phylogenetic Tree Generation of Sanskrit Texts
Diptesh Kanojia | Abhijeet Dubey | Malhar Kulkarni | Pushpak Bhattacharyya | Gholemreza Haffari
Proceedings of the 6th International Sanskrit Computational Linguistics Symposium

A Multi-task Model for Multilingual Trigger Detection and Classification
Sovan Kumar Sahoo | Saumajit Saha | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 16th International Conference on Natural Language Processing

In this paper we present a deep multi-task learning framework for multilingual event and argument trigger detection and classification. In our current work, we identify detection and classification of both event and argument triggers as related tasks and follow a multi-tasking approach to solve them simultaneously in contrast to the previous works where these tasks were solved separately or learning some of the above mentioned tasks jointly. We evaluate the proposed approach with multiple low-resource Indian languages. As there were no datasets available for the Indian languages, we have annotated disaster related news data crawled from the online news portal for different low-resource Indian languages for our experiments. Our empirical evaluation shows that multi-task model performs better than the single task model, and classification helps in trigger detection and vice-versa.

IITP-MT System for Gujarati-English News Translation Task at WMT 2019
Sukanta Sen | Kamal Kumar Gupta | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

We describe our submission to WMT 2019 News translation shared task for Gujarati-English language pair. We submit constrained systems, i.e, we rely on the data provided for this language pair and do not use any external data. We train Transformer based subword-level neural machine translation (NMT) system using original parallel corpus along with synthetic parallel corpus obtained through back-translation of monolingual data. Our primary systems achieve BLEU scores of 10.4 and 8.1 for Gujarati→English and English→Gujarati, respectively. We observe that incorporating monolingual data through back-translation improves the BLEU score significantly over baseline NMT and SMT systems for this language pair.

A Deep Ensemble Framework for Fake News Detection and Multi-Class Classification of Short Political Statements
Arjun Roy | Kingshuk Basak | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 16th International Conference on Natural Language Processing

Fake news, rumor, incorrect information, and misinformation detection are nowadays crucial issues as these might have serious consequences for our social fabrics. Such information is increasing rapidly due to the availability of enormous web information sources including social media feeds, news blogs, online newspapers etc. In this paper, we develop various deep learning models for detecting fake news and classifying them into the pre-defined fine-grained categories. At first, we develop individual models based on Convolutional Neural Network (CNN), and Bi-directional Long Short Term Memory (Bi-LSTM) networks. The representations obtained from these two models are fed into a Multi-layer Perceptron Model (MLP) for the final classification. Our experiments on a benchmark dataset show promising results with an overall accuracy of 44.87%, which outperforms the current state of the arts.

Proceedings of the 16th International Conference on Natural Language Processing
Dipti Misra Sharma | Pushpak Bhattacharya
Proceedings of the 16th International Conference on Natural Language Processing

Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource Languages
Rudra Murthy | Anoop Kunchukuttan | Pushpak Bhattacharyya
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Transfer learning approaches for Neural Machine Translation (NMT) train a NMT model on an assisting language-target language pair (parent model) which is later fine-tuned for the source language-target language pair of interest (child model), with the target language being the same. In many cases, the assisting language has a different word order from the source language. We show that divergent word order adversely limits the benefits from transfer learning when little to no parallel corpus between the source and target language is available. To bridge this divergence, we propose to pre-order the assisting language sentences to match the word order of the source language and train the parent model. Our experiments on many language pairs show that bridging the word order gap leads to significant improvement in the translation quality in extremely low-resource scenarios.

Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis
Md Shad Akhtar | Dushyant Chauhan | Deepanway Ghosal | Soujanya Poria | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Related tasks often have inter-dependence on each other and perform better when solved in a joint framework. In this paper, we present a deep multi-task learning framework that jointly performs sentiment and emotion analysis both. The multi-modal inputs (i.e. text, acoustic and visual frames) of a video convey diverse and distinctive information, and usually do not have equal contribution in the decision making. We propose a context-level inter-modal attention framework for simultaneously predicting the sentiment and expressed emotions of an utterance. We evaluate our proposed approach on CMU-MOSEI dataset for multi-modal sentiment and emotion analysis. Evaluation results suggest that multi-task learning framework offers improvement over the single-task framework. The proposed approach reports new state-of-the-art performance for both sentiment analysis and emotion analysis.

Language-Agnostic Model for Aspect-Based Sentiment Analysis
Md Shad Akhtar | Abhishek Kumar | Asif Ekbal | Chris Biemann | Pushpak Bhattacharyya
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

In this paper, we propose a language-agnostic deep neural network architecture for aspect-based sentiment analysis. The proposed approach is based on Bidirectional Long Short-Term Memory (Bi-LSTM) network, which is further assisted with extra hand-crafted features. We define three different architectures for the successful combination of word embeddings and hand-crafted features. We evaluate the proposed approach for six languages (i.e. English, Spanish, French, Dutch, German and Hindi) and two problems (i.e. aspect term extraction and aspect sentiment classification). Experiments show that the proposed model attains state-of-the-art performance in most of the settings.

“When Numbers Matter!”: Detecting Sarcasm in Numerical Portions of Text
Abhijeet Dubey | Lakshya Kumar | Arpan Somani | Aditya Joshi | Pushpak Bhattacharyya
Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Research in sarcasm detection spans almost a decade. However a particular form of sarcasm remains unexplored: sarcasm expressed through numbers, which we estimate, forms about 11% of the sarcastic tweets in our dataset. The sentence ‘Love waking up at 3 am’ is sarcastic because of the number. In this paper, we focus on detecting sarcasm in tweets arising out of numbers. Initially, to get an insight into the problem, we implement a rule-based and a statistical machine learning-based (ML) classifier. The rule-based classifier conveys the crux of the numerical sarcasm problem, namely, incongruity arising out of numbers. The statistical ML classifier uncovers the indicators i.e., features of such sarcasm. The actual system in place, however, are two deep learning (DL) models, CNN and attention network that obtains an F-score of 0.93 and 0.91 on our dataset of tweets containing numbers. To the best of our knowledge, this is the first line of research investigating the phenomenon of sarcasm arising out of numbers, culminating in a detector thereof.

Utilizing Wordnets for Cognate Detection among Indian Languages
Diptesh Kanojia | Kevin Patel | Malhar Kulkarni | Pushpak Bhattacharyya | Gholemreza Haffari
Proceedings of the 10th Global Wordnet Conference

Automatic Cognate Detection (ACD) is a challenging task which has been utilized to help NLP applications like Machine Translation, Information Retrieval and Computational Phylogenetics. Unidentified cognate pairs can pose a challenge to these applications and result in a degradation of performance. In this paper, we detect cognate word pairs among ten Indian languages with Hindi and use deep learning methodologies to predict whether a word pair is cognate or not. We identify IndoWordnet as a potential resource to detect cognate word pairs based on orthographic similarity-based methods and train neural network models using the data obtained from it. We identify parallel corpora as another potential resource and perform the same experiments for them. We also validate the contribution of Wordnets through further experimentation and report improved performance of up to 26%. We discuss the nuances of cognate detection among closely related Indian languages and release the lists of detected cognates as a dataset. We also observe the behaviour of, to an extent, unrelated Indian language pairs and release the lists of detected cognates among them as well.

Courteously Yours: Inducing courteous behavior in Customer Care responses using Reinforced Pointer Generator Network
Hitesh Golchha | Mauajama Firdaus | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

In this paper, we propose an effective deep learning framework for inducing courteous behavior in customer care responses. The interaction between a customer and the customer care representative contributes substantially to the overall customer experience. Thus it is imperative for customer care agents and chatbots engaging with humans to be personal, cordial and emphatic to ensure customer satisfaction and retention. Our system aims at automatically transforming neutral customer care responses into courteous replies. Along with stylistic transfer (of courtesy), our system ensures that responses are coherent with the conversation history, and generates courteous expressions consistent with the emotional state of the customer. Our technique is based on a reinforced pointer-generator model for the sequence to sequence task. The model is also conditioned on a hierarchically encoded and emotionally aware conversational context. We use real interactions on Twitter between customer care professionals and aggrieved customers to create a large conversational dataset having both forms of agent responses: ‘generic’ and ‘courteous’. We perform quantitative and qualitative analyses on established and task-specific metrics, both automatic and human evaluation based. Our evaluation shows that the proposed models can generate emotionally-appropriate courteous expressions while preserving the content. Experimental results also prove that our proposed approach performs better than the baseline models.

Extraction of Message Sequence Charts from Narrative History Text
Girish Palshikar | Sachin Pawar | Sangameshwar Patil | Swapnil Hingmire | Nitin Ramrakhiyani | Harsimran Bedi | Pushpak Bhattacharyya | Vasudeva Varma
Proceedings of the First Workshop on Narrative Understanding

In this paper, we advocate the use of Message Sequence Chart (MSC) as a knowledge representation to capture and visualize multi-actor interactions and their temporal ordering. We propose algorithms to automatically extract an MSC from a history narrative. For a given narrative, we first identify verbs which indicate interactions and then use dependency parsing and Semantic Role Labelling based approaches to identify senders (initiating actors) and receivers (other actors involved) for these interaction verbs. As a final step in MSC extraction, we employ a state-of-the art algorithm to temporally re-order these interactions. Our evaluation on multiple publicly available narratives shows improvements over four baselines.

Ordinal and Attribute Aware Response Generation in a Multimodal Dialogue System
Hardik Chauhan | Mauajama Firdaus | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Multimodal dialogue systems have opened new frontiers in the traditional goal-oriented dialogue systems. The state-of-the-art dialogue systems are primarily based on unimodal sources, predominantly the text, and hence cannot capture the information present in the other sources such as videos, audios, images etc. With the availability of large scale multimodal dialogue dataset (MMD) (Saha et al., 2018) on the fashion domain, the visual appearance of the products is essential for understanding the intention of the user. Without capturing the information from both the text and image, the system will be incapable of generating correct and desirable responses. In this paper, we propose a novel position and attribute aware attention mechanism to learn enhanced image representation conditioned on the user utterance. Our evaluation shows that the proposed model can generate appropriate responses while preserving the position and attribute information. Experimental results also prove that our proposed approach attains superior performance compared to the baseline models, and outperforms the state-of-the-art approaches on text similarity based evaluation metrics.

Context-aware Interactive Attention for Multi-modal Sentiment and Emotion Analysis
Dushyant Singh Chauhan | Md Shad Akhtar | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In recent times, multi-modal analysis has been an emerging and highly sought-after field at the intersection of natural language processing, computer vision, and speech processing. The prime objective of such studies is to leverage the diversified information, (e.g., textual, acoustic and visual), for learning a model. The effective interaction among these modalities often leads to a better system in terms of performance. In this paper, we introduce a recurrent neural network based approach for the multi-modal sentiment and emotion analysis. The proposed model learns the inter-modal interaction among the participating modalities through an auto-encoder mechanism. We employ a context-aware attention module to exploit the correspondence among the neighboring utterances. We evaluate our proposed approach for five standard multi-modal affect analysis datasets. Experimental results suggest the efficacy of the proposed model for both sentiment and emotion analysis over various existing state-of-the-art systems.

Converting Sentiment Annotated Data to Emotion Annotated Data
Manasi Kulkarni | Pushpak Bhattacharyya
Proceedings of the 16th International Conference on Natural Language Processing

Existing supervised solutions for emotion classification demand large amount of emotion annotated data. Such resources may not be available for many languages. However, it is common to have sentiment annotated data available in these languages. The sentiment information (+1 or -1) is useful to segregate between positive emotions or negative emotions. In this paper, we propose an unsupervised approach for emotion recognition by taking advantage of the sentiment information. Given a sentence and its sentiment information, recognize the best possible emotion for it. For every sentence, the semantic relatedness between the words from sentence and a set of emotion-specific words is calculated using cosine similarity. An emotion vector representing the emotion score for each emotion category of Ekman’s model, is created. It is further improved with the dependency relations and the best possible emotion is predicted. The results show the significant improvement in f-score values for text with sentiment information as input over our baseline as text without sentiment information. We report the weighted f-score on three different datasets with the Ekman’s emotion model. This supports that by leveraging the sentiment value, better emotion annotated data can be created.

A Deep Learning Approach for Automatic Detection of Fake News
Tanik Saikh | Arkadipta De | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 16th International Conference on Natural Language Processing

Fake news detection is a very prominent and essential task in the field of journalism. This challenging problem is seen so far in the field of politics, but it could be even more challenging when it is to be determined in the multi-domain platform. In this paper, we propose two effective models based on deep learning for solving fake news detection problem in online news contents of multiple domains. We evaluate our techniques on the two recently released datasets, namely Fake News AMT and Celebrity for fake news detection. The proposed systems yield encouraging performance, outperforming the current hand-crafted feature engineering based state-of-the-art system with a significant margin of 3.08% and 9.3% by the two models, respectively. In order to exploit the datasets, available for the related tasks, we perform cross-domain analysis (model trained on FakeNews AMT and tested on Celebrity and vice versa) to explore the applicability of our systems across the domains.

Extraction of Message Sequence Charts from Software Use-Case Descriptions
Girish Palshikar | Nitin Ramrakhiyani | Sangameshwar Patil | Sachin Pawar | Swapnil Hingmire | Vasudeva Varma | Pushpak Bhattacharyya
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)

Software Requirement Specification documents provide natural language descriptions of the core functional requirements as a set of use-cases. Essentially, each use-case contains a set of actors and sequences of steps describing the interactions among them. Goals of use-case reviews and analyses include their correctness, completeness, detection of ambiguities, prototyping, verification, test case generation and traceability. Message Sequence Chart (MSC) have been proposed as a expressive, rigorous yet intuitive visual representation of use-cases. In this paper, we describe a linguistic knowledge-based approach to extract MSCs from use-cases. Compared to existing techniques, we extract richer constructs of the MSC notation such as timers, conditions and alt-boxes. We apply this tool to extract MSCs from several real-life software use-case descriptions and show that it performs better than the existing techniques. We also discuss the benefits and limitations of the extracted MSCs to meet the above goals.

2018

The IIT Bombay English-Hindi Parallel Corpus
Anoop Kunchukuttan | Pratik Mehta | Pushpak Bhattacharyya
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Solving Data Sparsity for Aspect Based Sentiment Analysis Using Cross-Linguality and Multi-Linguality
Md Shad Akhtar | Palaash Sawant | Sukanta Sen | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Efficient word representations play an important role in solving various problems related to Natural Language Processing (NLP), data mining, text mining etc. The issue of data sparsity poses a great challenge in creating efficient word representation model for solving the underlying problem. The problem is more intensified in resource-poor scenario due to the absence of sufficient amount of corpus. In this work we propose to minimize the effect of data sparsity by leveraging bilingual word embeddings learned through a parallel corpus. We train and evaluate Long Short Term Memory (LSTM) based architecture for aspect level sentiment classification. The neural network architecture is further assisted by the hand-crafted features for the prediction. We show the efficacy of the proposed model against state-of-the-art methods in two experimental setups i.e. multi-lingual and cross-lingual.

Thank “Goodness”! A Way to Measure Style in Student Essays
Sandeep Mathias | Pushpak Bhattacharyya
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

Essays have two major components for scoring - content and style. In this paper, we describe a property of the essay, called goodness, and use it to predict the score given for the style of student essays. We compare our approach to solve this problem with baseline approaches, like language modeling and also a state-of-the-art deep learning system. We show that, despite being quite intuitive, our approach is very powerful in predicting the style of the essays.

Sentence Level Temporality Detection using an Implicit Time-sensed Resource
Sabyasachi Kamila | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

A Deep Neural Network based Approach for Entity Extraction in Code-Mixed Indian Social Media Text
Deepak Gupta | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Resolving Actor Coreferences in Hindi Narrative Text
Nitin Ramrakhiyani | Swapnil Hingmire | Sachin Pawar | Sangameshwar Patil | Girish K. Palshikar | Pushpak Bhattacharyya | Vasudeva Verma
Proceedings of the 15th International Conference on Natural Language Processing

A Deep Learning Model for Event Extraction and Classification in Hindi for Disaster Domain
Zishan Ahmad | Sahoo Sovan Kumar | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 15th International Conference on Natural Language Processing

Sarcasm Target Identification: Dataset and An Introductory Approach
Aditya Joshi | Pranav Goel | Pushpak Bhattacharyya | Mark Carman
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Novelty Goes Deep. A Deep Neural Solution To Document Level Novelty Detection
Tirthankar Ghosal | Vignesh Edithal | Asif Ekbal | Pushpak Bhattacharyya | George Tsatsaronis | Srinivasa Satya Sameer Kumar Chivukula
Proceedings of the 27th International Conference on Computational Linguistics

The rapid growth of documents across the web has necessitated finding means of discarding redundant documents and retaining novel ones. Capturing redundancy is challenging as it may involve investigating at a deep semantic level. Techniques for detecting such semantic redundancy at the document level are scarce. In this work we propose a deep Convolutional Neural Networks (CNN) based model to classify a document as novel or redundant with respect to a set of relevant documents already seen by the system. The system is simple and do not require any manual feature engineering. Our novel scheme encodes relevant and relative information from both source and target texts to generate an intermediate representation which we coin as the Relative Document Vector (RDV). The proposed method outperforms the existing state-of-the-art on a document-level novelty detection dataset by a margin of ∼5% in terms of accuracy. We further demonstrate the effectiveness of our approach on a standard paraphrase detection dataset where paraphrased passages closely resemble to semantically redundant documents.

Identifying Transferable Information Across Domains for Cross-domain Sentiment Classification
Raksha Sharma | Pushpak Bhattacharyya | Sandipan Dandapat | Himanshu Sharad Bhatt
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Getting manually labeled data in each domain is always an expensive and a time consuming task. Cross-domain sentiment analysis has emerged as a demanding concept where a labeled source domain facilitates a sentiment classifier for an unlabeled target domain. However, polarity orientation (positive or negative) and the significance of a word to express an opinion often differ from one domain to another domain. Owing to these differences, cross-domain sentiment classification is still a challenging task. In this paper, we propose that words that do not change their polarity and significance represent the transferable (usable) information across domains for cross-domain sentiment classification. We present a novel approach based on χ2 test and cosine-similarity between context vector of words to identify polarity preserving significant words across domains. Furthermore, we show that a weighted ensemble of the classifiers enhances the cross-domain classification performance.

Fine-Grained Temporal Orientation and its Relationship with Psycho-Demographic Correlates
Sabyasachi Kamila | Mohammed Hasanuzzaman | Asif Ekbal | Pushpak Bhattacharyya | Andy Way
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Temporal orientation refers to an individual’s tendency to connect to the psychological concepts of past, present or future, and it affects personality, motivation, emotion, decision making and stress coping processes. The study of the social media users’ psycho-demographic attributes from the perspective of human temporal orientation can be of utmost interest and importance to the business and administrative decision makers as it can provide an extra precious information for them to make informed decisions. In this paper, we propose a very first study to demonstrate the association between the sentiment view of the temporal orientation of the users and their different psycho-demographic attributes by analyzing their tweets. We first create a temporal orientation classifier in a minimally supervised way which classifies each tweet of the users in one of the three temporal categories, namely past, present, and future. A deep Bi-directional Long Short Term Memory (BLSTM) is used for the tweet classification task. Our tweet classifier achieves an accuracy of 78.27% when tested on a manually created test set. We then determine the users’ overall temporal orientation based on their tweets on the social media. The sentiment is added to the tweets at the fine-grained level where each temporal tweet is given a sentiment with either of the positive, negative or neutral. Our experiment reveals that depending upon the sentiment view of temporal orientation, a user’s attributes vary. We finally measure the correlation between the users’ sentiment view of temporal orientation and their different psycho-demographic factors using regression.

The Whole is Greater than the Sum of its Parts: Towards the Effectiveness of Voting Ensemble Classifiers for Complex Word Identification
Nikhil Wani | Sandeep Mathias | Jayashree Aanand Gajjam | Pushpak Bhattacharyya
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

In this paper, we present an effective system using voting ensemble classifiers to detect contextually complex words for non-native English speakers. To make the final decision, we channel a set of eight calibrated classifiers based on lexical, size and vocabulary features and train our model with annotated datasets collected from a mixture of native and non-native speakers. Thereafter, we test our system on three datasets namely News, WikiNews, and Wikipedia and report competitive results with an F1-Score ranging between 0.777 to 0.855 for each of the datasets. Our system outperforms multiple other models and falls within 0.042 to 0.026 percent of the best-performing model’s score in the shared task.

Indian Language Wordnets and their Linkages with Princeton WordNet
Diptesh Kanojia | Kevin Patel | Pushpak Bhattacharyya
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Leveraging Orthographic Similarity for Multilingual Neural Transliteration
Anoop Kunchukuttan | Mitesh Khapra | Gurneet Singh | Pushpak Bhattacharyya
Transactions of the Association for Computational Linguistics, Volume 6

We address the task of joint training of transliteration models for multiple language pairs (multilingual transliteration). This is an instance of multitask learning, where individual tasks (language pairs) benefit from sharing knowledge with related tasks. We focus on transliteration involving related tasks i.e., languages sharing writing systems and phonetic properties (orthographically similar languages). We propose a modified neural encoder-decoder model that maximizes parameter sharing across language pairs in order to effectively leverage orthographic similarity. We show that multilingual transliteration significantly outperforms bilingual transliteration in different scenarios (average increase of 58% across a variety of languages we experimented with). We also show that multilingual transliteration models can generalize well to languages/language pairs not encountered during training and hence perform well on the zeroshot transliteration task. We show that further improvements can be achieved by using phonetic feature input.

Helping each Other: A Framework for Customer-to-Customer Suggestion Mining using a Semi-supervised Deep Neural Network
Hitesh Golchha | Deepak Gupta | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 15th International Conference on Natural Language Processing

Uncovering Code-Mixed Challenges: A Framework for Linguistically Driven Question Generation and Neural Based Question Answering
Deepak Gupta | Pabitra Lenka | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 22nd Conference on Computational Natural Language Learning

Existing research on question answering (QA) and comprehension reading (RC) are mainly focused on the resource-rich language like English. In recent times, the rapid growth of multi-lingual web content has posed several challenges to the existing QA systems. Code-mixing is one such challenge that makes the task more complex. In this paper, we propose a linguistically motivated technique for code-mixed question generation (CMQG) and a neural network based architecture for code-mixed question answering (CMQA). For evaluation, we manually create the code-mixed questions for Hindi-English language pair. In order to show the effectiveness of our neural network based CMQA technique, we utilize two benchmark datasets, SQuAD and MMQA. Experiments show that our proposed model achieves encouraging performance on CMQG and CMQA.

pyiwn: A Python based API to access Indian Language WordNets
Ritesh Panjwani | Diptesh Kanojia | Pushpak Bhattacharyya
Proceedings of the 9th Global Wordnet Conference

Indian language WordNets have their individual web-based browsing interfaces along with a common interface for IndoWordNet. These interfaces prove to be useful for language learners and in an educational domain, however, they do not provide the functionality of connecting to them and browsing their data through a lucid application programming interface or an API. In this paper, we present our work on creating such an easy-to-use framework which is bundled with the data for Indian language WordNets and provides NLTK WordNet interface like core functionalities in Python. Additionally, we use a pre-built speech synthesis system for Hindi language and augment Hindi data with audios for words, glosses, and example sentences. We provide a detailed usage of our API and explain the functions for ease of the user. Also, we package the IndoWordNet data along with the source code and provide it openly for the purpose of research. We aim to provide all our work as an open source framework for further development.

IITP-MT at WAT2018: Transformer-based Multilingual Indic-English Neural Machine Translation System
Sukanta Sen | Kamal Kumar Gupta | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation

Multi-Task Learning Framework for Mining Crowd Intelligence towards Clinical Treatment
Shweta Yadav | Asif Ekbal | Sriparna Saha | Pushpak Bhattacharyya | Amit Sheth
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

In recent past, social media has emerged as an active platform in the context of healthcare and medicine. In this paper, we present a study where medical user’s opinions on health-related issues are analyzed to capture the medical sentiment at a blog level. The medical sentiments can be studied in various facets such as medical condition, treatment, and medication that characterize the overall health status of the user. Considering these facets, we treat analysis of this information as a multi-task classification problem. In this paper, we adopt a novel adversarial learning approach for our multi-task learning framework to learn the sentiment’s strengths expressed in a medical blog. Our evaluation shows promising results for our target tasks.

Synthesizing Audio for Hindi WordNet
Diptesh Kanojia | Preethi Jyothi | Pushpak Bhattacharyya
Proceedings of the 9th Global Wordnet Conference

In this paper, we describe our work on the creation of a voice model using a speech synthesis system for the Hindi Language. We use pre-existing “voices”, use publicly available speech corpora to create a “voice” using the Festival Speech Synthesis System (Black, 1997). Our contribution is two-fold: (1) We scrutinize multiple speech synthesis systems and provide an extensive report on the currently available state-of-the-art systems. We also develop voices using the existing implementations of the aforementioned systems, and (2) We use these voices to generate sample audios for randomly chosen words; manually evaluate the audio generated, and produce audio for all WordNet words using the winner voice model. We also produce audios for the Hindi WordNet Glosses and Example sentences. We describe our efforts to use pre-existing implementations for WaveNet - a model to generate raw audio using neural nets (Oord et al., 2016) and generate speech for Hindi. Our lexicographers perform a manual evaluation of the audio generated using multiple voices. A qualitative and quantitative analysis reveals that the voice model generated by us performs the best with an accuracy of 0.44.

Towards a Standardized Dataset for Noun Compound Interpretation
Girishkumar Ponkiya | Kevin Patel | Pushpak Bhattacharyya | Girish K Palshikar
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Eyes are the Windows to the Soul: Predicting the Rating of Text Quality Using Gaze Behaviour
Sandeep Mathias | Diptesh Kanojia | Kevin Patel | Samarth Agrawal | Abhijit Mishra | Pushpak Bhattacharyya
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Predicting a reader’s rating of text quality is a challenging task that involves estimating different subjective aspects of the text, like structure, clarity, etc. Such subjective aspects are better handled using cognitive information. One such source of cognitive information is gaze behaviour. In this paper, we show that gaze behaviour does indeed help in effectively predicting the rating of text quality. To do this, we first we model text quality as a function of three properties - organization, coherence and cohesion. Then, we demonstrate how capturing gaze behaviour helps in predicting each of these properties, and hence the overall quality, by reporting improvements obtained by adding gaze features to traditional textual features for score prediction. We also hypothesize that if a reader has fully understood the text, the corresponding gaze behaviour would give a better indication of the assigned rating, as opposed to partial understanding. Our experiments validate this hypothesis by showing greater agreement between the given rating and the predicted rating when the reader has a full understanding of the text.

Does Curriculum Learning help Deep Learning for Natural Language Generation?
Sandhya Singh | Kevin Patel | Pushpak Bhattacharya | Krishnanjan Bhattacharjee | Hemant Darbari | Seema Verma
Proceedings of the 15th International Conference on Natural Language Processing

WupLeBleu: The Word-net Based Evaluation Metric for Machine Translation
Debajyoty Banik | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 15th International Conference on Natural Language Processing

Multilingual Indian Language Translation System at WAT 2018: Many-to-one Phrase-based SMT
Tamali Banerjee | Anoop Kunchukuttan | Pushpak Bhattacharya
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation

Judicious Selection of Training Data in Assisting Language for Multilingual Neural NER
Rudra Murthy | Anoop Kunchukuttan | Pushpak Bhattacharyya
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Multilingual learning for Neural Named Entity Recognition (NNER) involves jointly training a neural network for multiple languages. Typically, the goal is improving the NER performance of one of the languages (the primary language) using the other assisting languages. We show that the divergence in the tag distributions of the common named entities between the primary and assisting languages can reduce the effectiveness of multilingual learning. To alleviate this problem, we propose a metric based on symmetric KL divergence to filter out the highly divergent training instances in the assisting language. We empirically show that our data selection strategy improves NER performance in many languages, including those with very limited training data.

ASAP++: Enriching the ASAP Automated Essay Grading Dataset with Essay Attribute Scores
Sandeep Mathias | Pushpak Bhattacharyya
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Meaningless yet meaningful: Morphology grounded subword-level NMT
Tamali Banerjee | Pushpak Bhattacharyya
Proceedings of the Second Workshop on Subword/Character LEvel Models

We explore the use of two independent subsystems Byte Pair Encoding (BPE) and Morfessor as basic units for subword-level neural machine translation (NMT). We show that, for linguistically distant language-pairs Morfessor-based segmentation algorithm produces significantly better quality translation than BPE. However, for close language-pairs BPE-based subword-NMT may translate better than Morfessor-based subword-NMT. We propose a combined approach of these two segmentation algorithms Morfessor-BPE (M-BPE) which outperforms these two baseline systems in terms of BLEU score. Our results are supported by experiments on three language-pairs: English-Hindi, Bengali-Hindi and English-Bengali.

Semi-automatic WordNet Linking using Word Embeddings
Kevin Patel | Diptesh Kanojia | Pushpak Bhattacharyya
Proceedings of the 9th Global Wordnet Conference

Wordnets are rich lexico-semantic resources. Linked wordnets are extensions of wordnets, which link similar concepts in wordnets of different languages. Such resources are extremely useful in many Natural Language Processing (NLP) applications, primarily those based on knowledge-based approaches. In such approaches, these resources are considered as gold standard/oracle. Thus, it is crucial that these resources hold correct information. Thereby, they are created by human experts. However, manual maintenance of such resources is a tedious and costly affair. Thus techniques that can aid the experts are desirable. In this paper, we propose an approach to link wordnets. Given a synset of the source language, the approach returns a ranked list of potential candidate synsets in the target language from which the human expert can choose the correct one(s). Our technique is able to retrieve a winner synset in the top 10 ranked list for 60% of all synsets and 70% of noun synsets.

Medical Sentiment Analysis using Social Media: Towards building a Patient Assisted System
Shweta Yadav | Asif Ekbal | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Hindi Wordnet for Language Teaching: Experiences and Lessons Learnt
Hanumant Redkar | Rajita Shukla | Sandhya Singh | Jaya Saraswati | Laxmi Kashyap | Diptesh Kanojia | Preethi Jyothi | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the 9th Global Wordnet Conference

This paper reports the work related to making Hindi Wordnet1 available as a digital resource for language learning and teaching, and the experiences and lessons that were learnt during the process. The language data of the Hindi Wordnet has been suitably modified and enhanced to make it into a language learning aid. This aid is based on modern pedagogical axioms and is aligned to the learning objectives of the syllabi of the school education in India. To make it into a comprehensive language tool, grammatical information has also been encoded, as far as these can be marked on the lexical items. The delivery of information is multi-layered, multi-sensory and is available across multiple digital platforms. The front end has been designed to offer an eye-catching user-friendly interface which is suitable for learners starting from age six onward. Preliminary testing of the tool has been done and it has been modified as per the feedbacks that were received. Above all, the entire exercise has offered gainful insights into learning based on associative networks and how knowledge based on such networks can be made available to modern learners.

Morphology Injection for English-Malayalam Statistical Machine Translation
Sreelekha S | Pushpak Bhattacharyya
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

An Iterative Approach for Unsupervised Most Frequent Sense Detection using WordNet and Word Embeddings
Kevin Patel | Pushpak Bhattacharyya
Proceedings of the 9th Global Wordnet Conference

Given a word, what is the most frequent sense in which it occurs in a given corpus? Most Frequent Sense (MFS) is a strong baseline for unsupervised word sense disambiguation. If we have large amounts of sense-annotated corpora, MFS can be trivially created. However, sense-annotated corpora are a rarity. In this paper, we propose a method which can compute MFS from raw corpora. Our approach iteratively exploits the semantic congruity among related words in corpus. Our method performs better compared to another similar work.

Contextual Inter-modal Attention for Multi-modal Sentiment Analysis
Deepanway Ghosal | Md Shad Akhtar | Dushyant Chauhan | Soujanya Poria | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Multi-modal sentiment analysis offers various challenges, one being the effective combination of different input modalities, namely text, visual and acoustic. In this paper, we propose a recurrent neural network based multi-modal attention framework that leverages the contextual information for utterance-level sentiment prediction. The proposed approach applies attention on multi-modal multi-utterance representations and tries to learn the contributing features amongst them. We evaluate our proposed approach on two multi-modal sentiment analysis benchmark datasets, viz. CMU Multi-modal Opinion-level Sentiment Intensity (CMU-MOSI) corpus and the recently released CMU Multi-modal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) corpus. Evaluation results show the effectiveness of our proposed approach with the accuracies of 82.31% and 79.80% for the MOSI and MOSEI datasets, respectively. These are approximately 2 and 1 points performance improvement over the state-of-the-art models for the datasets.

Treat us like the sequences we are: Prepositional Paraphrasing of Noun Compounds using LSTM
Girishkumar Ponkiya | Kevin Patel | Pushpak Bhattacharyya | Girish Palshikar
Proceedings of the 27th International Conference on Computational Linguistics

Interpreting noun compounds is a challenging task. It involves uncovering the underlying predicate which is dropped in the formation of the compound. In most cases, this predicate is of the form VERB+PREP. It has been observed that uncovering the preposition is a significant step towards uncovering the predicate. In this paper, we attempt to paraphrase noun compounds using prepositions. We consider noun compounds and their corresponding prepositional paraphrases as parallelly aligned sequences of words. This enables us to adapt different architectures from cross-lingual embedding literature. We choose the architecture where we create representations of both noun compound (source sequence) and its corresponding prepositional paraphrase (target sequence), such that their sim- ilarity is high. We use LSTMs to learn these representations. We use these representations to decide the correct preposition. Our experiments show that this approach performs considerably well on different datasets of noun compounds that are manually annotated with prepositions.

MMQA: A Multi-domain Multi-lingual Question-Answering Framework for English and Hindi
Deepak Gupta | Surabhi Kumari | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Can Taxonomy Help? Improving Semantic Question Matching using Question Taxonomy
Deepak Gupta | Rajkumar Pujari | Asif Ekbal | Pushpak Bhattacharyya | Anutosh Maitra | Tom Jain | Shubhashis Sengupta
Proceedings of the 27th International Conference on Computational Linguistics

In this paper, we propose a hybrid technique for semantic question matching. It uses a proposed two-layered taxonomy for English questions by augmenting state-of-the-art deep learning models with question classes obtained from a deep learning based question classifier. Experiments performed on three open-domain datasets demonstrate the effectiveness of our proposed approach. We achieve state-of-the-art results on partial ordering question ranking (POQR) benchmark dataset. Our empirical analysis shows that coupling standard distributional features (provided by the question encoder) with knowledge from taxonomy is more effective than either deep learning or taxonomy-based knowledge alone.

TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection
Tirthankar Ghosal | Amitra Salam | Swati Tiwari | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Identification of Alias Links among Participants in Narratives
Sangameshwar Patil | Sachin Pawar | Swapnil Hingmire | Girish Palshikar | Vasudeva Varma | Pushpak Bhattacharyya
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Identification of distinct and independent participants (entities of interest) in a narrative is an important task for many NLP applications. This task becomes challenging because these participants are often referred to using multiple aliases. In this paper, we propose an approach based on linguistic knowledge for identification of aliases mentioned using proper nouns, pronouns or noun phrases with common noun headword. We use Markov Logic Network (MLN) to encode the linguistic knowledge for identification of aliases. We evaluate on four diverse history narratives of varying complexity. Our approach performs better than the state-of-the-art approach as well as a combination of standard named entity recognition and coreference resolution techniques.

2017

A Multilayer Perceptron based Ensemble Technique for Fine-grained Financial Sentiment Analysis
Md Shad Akhtar | Abhishek Kumar | Deepanway Ghosal | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper, we propose a novel method for combining deep learning and classical feature based models using a Multi-Layer Perceptron (MLP) network for financial sentiment analysis. We develop various deep learning models based on Convolutional Neural Network (CNN), Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU). These are trained on top of pre-trained, autoencoder-based, financial word embeddings and lexicon features. An ensemble is constructed by combining these deep learning models and a classical supervised model based on Support Vector Regression (SVR). We evaluate our proposed technique on a benchmark dataset of SemEval-2017 shared task on financial sentiment analysis. The propose model shows impressive results on two datasets, i.e. microblogs and news headlines datasets. Comparisons show that our proposed model performs better than the existing state-of-the-art systems for the above two datasets by 2.0 and 4.1 cosine points, respectively.

Towards Harnessing Memory Networks for Coreference Resolution
Joe Cheri | Pushpak Bhattacharyya
Proceedings of the 2nd Workshop on Representation Learning for NLP

Coreference resolution task demands comprehending a discourse, especially for anaphoric mentions which require semantic information for resolving antecedents. We investigate into how memory networks can be helpful for coreference resolution when posed as question answering problem. The comprehension capability of memory networks assists coreference resolution, particularly for the mentions that require semantic and context information. We experiment memory networks for coreference resolution, with 4 synthetic datasets generated for coreference resolution with varying difficulty levels. Our system’s performance is compared with a traditional coreference resolution system to show why memory network can be promising for coreference resolution.

IITPB at SemEval-2017 Task 5: Sentiment Prediction in Financial Text
Abhishek Kumar | Abhishek Sethi | Md Shad Akhtar | Asif Ekbal | Chris Biemann | Pushpak Bhattacharyya
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper reports team IITPB’s participation in the SemEval 2017 Task 5 on ‘Fine-grained sentiment analysis on financial microblogs and news’. We developed 2 systems for the two tracks. One system was based on an ensemble of Support Vector Classifier and Logistic Regression. This system relied on Distributional Thesaurus (DT), word embeddings and lexicon features to predict a floating sentiment value between -1 and +1. The other system was based on Support Vector Regression using word embeddings, lexicon features, and PMI scores as features. The system was ranked 5th in track 1 and 8th in track 2.

IITP at SemEval-2017 Task 8 : A Supervised Approach for Rumour Evaluation
Vikram Singh | Sunny Narayan | Md Shad Akhtar | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes our system participation in the SemEval-2017 Task 8 ‘RumourEval: Determining rumour veracity and support for rumours’. The objective of this task was to predict the stance and veracity of the underlying rumour. We propose a supervised classification approach employing several lexical, content and twitter specific features for learning. Evaluation shows promising results for both the problems.

Entity Extraction in Biomedical Corpora: An Approach to Evaluate Word Embedding Features with PSO based Feature Selection
Shweta Yadav | Asif Ekbal | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Text mining has drawn significant attention in recent past due to the rapid growth in biomedical and clinical records. Entity extraction is one of the fundamental components for biomedical text mining. In this paper, we propose a novel approach of feature selection for entity extraction that exploits the concept of deep learning and Particle Swarm Optimization (PSO). The system utilizes word embedding features along with several other features extracted by studying the properties of the datasets. We obtain an interesting observation that compact word embedding features as determined by PSO are more effective compared to the entire word embedding feature set for entity extraction. The proposed system is evaluated on three benchmark biomedical datasets such as GENIA, GENETAG, and AiMed. The effectiveness of the proposed approach is evident with significant performance gains over the baseline models as well as the other existing systems. We observe improvements of 7.86%, 5.27% and 7.25% F-measure points over the baseline models for GENIA, GENETAG, and AiMed dataset respectively.

Adapting Pre-trained Word Embeddings For Use In Medical Coding
Kevin Patel | Divya Patel | Mansi Golakiya | Pushpak Bhattacharyya | Nilesh Birari
Proceedings of the 16th BioNLP Workshop

Word embeddings are a crucial component in modern NLP. Pre-trained embeddings released by different groups have been a major reason for their popularity. However, they are trained on generic corpora, which limits their direct use for domain specific tasks. In this paper, we propose a method to add task specific information to pre-trained word embeddings. Such information can improve their utility. We add information from medical coding data, as well as the first level from the hierarchy of ICD-10 medical code set to different pre-trained word embeddings. We adapt CBOW algorithm from the word2vec package for our purpose. We evaluated our approach on five different pre-trained word embeddings. Both the original word embeddings, and their modified versions (the ones with added information) were used for automated review of medical coding. The modified word embeddings give an improvement in f-score by 1% on the 5-fold evaluation on a private medical claims dataset. Our results show that adding extra information is possible and beneficial for the task at hand.

Hindi Shabdamitra: A Wordnet based E-Learning Tool for Language Learning and Teaching
Hanumant Redkar | Sandhya Singh | Meenakshi Somasundaram | Dhara Gorasia | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)

In today’s technology driven digital era, education domain is undergoing a transformation from traditional approaches to more learner controlled and flexible methods of learning. This transformation has opened the new avenues for interdisciplinary research in the field of educational technology and natural language processing in developing quality digital aids for learning and teaching. The tool presented here - Hindi Shabhadamitra, developed using Hindi Wordnet for Hindi language learning, is one such e-learning tool. It has been developed as a teaching and learning aid suitable for formal school based curriculum and informal setup for self learning users. Besides vocabulary, it also provides word based grammar along with images and pronunciation for better learning and retention. This aid demonstrates that how a rich lexical resource like wordnet can be systematically remodeled for practical usage in the educational domain.

IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Question Answering and Implicit Dialogue Identification
Titas Nandi | Chris Biemann | Seid Muhie Yimam | Deepak Gupta | Sarah Kohail | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

In this paper we present the system for Answer Selection and Ranking in Community Question Answering, which we build as part of our participation in SemEval-2017 Task 3. We develop a Support Vector Machine (SVM) based system that makes use of textual, domain-specific, word-embedding and topic-modeling features. In addition, we propose a novel method for dialogue chain identification in comment threads. Our primary submission won subtask C, outperforming other systems in all the primary evaluation metrics. We performed well in other English subtasks, ranking third in subtask A and eighth in subtask B. We also developed open source toolkits for all the three English subtasks by the name cQARank [https://github.com/TitasNandi/cQARank].

Comparing Recurrent and Convolutional Architectures for English-Hindi Neural Machine Translation
Sandhya Singh | Ritesh Panjwani | Anoop Kunchukuttan | Pushpak Bhattacharyya
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

In this paper, we empirically compare the two encoder-decoder neural machine translation architectures: convolutional sequence to sequence model (ConvS2S) and recurrent sequence to sequence model (RNNS2S) for English-Hindi language pair as part of IIT Bombay’s submission to WAT2017 shared task. We report the results for both English-Hindi and Hindi-English direction of language pair.

Document Level Novelty Detection: Textual Entailment Lends a Helping Hand
Tanik Saikh | Tirthankar Ghosal | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

Sentiment Intensity Ranking among Adjectives Using Sentiment Bearing Word Embeddings
Raksha Sharma | Arpan Somani | Lakshya Kumar | Pushpak Bhattacharyya
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Identification of intensity ordering among polar (positive or negative) words which have the same semantics can lead to a fine-grained sentiment analysis. For example, ‘master’, ‘seasoned’ and ‘familiar’ point to different intensity levels, though they all convey the same meaning (semantics), i.e., expertise: having a good knowledge of. In this paper, we propose a semi-supervised technique that uses sentiment bearing word embeddings to produce a continuous ranking among adjectives that share common semantics. Our system demonstrates a strong Spearman’s rank correlation of 0.83 with the gold standard ranking. We show that sentiment bearing word embeddings facilitate a more accurate intensity ranking system than other standard word embeddings (word2vec and GloVe). Word2vec is the state-of-the-art for intensity ordering task.

Is your Statement Purposeless? Predicting Computer Science Graduation Admission Acceptance based on Statement Of Purpose
Diptesh Kanojia | Nikhil Wani | Pushpak Bhattacharyya
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

IITP at IJCNLP-2017 Task 4: Auto Analysis of Customer Feedback using CNN and GRU Network
Deepak Gupta | Pabitra Lenka | Harsimran Bedi | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the IJCNLP 2017, Shared Tasks

Analyzing customer feedback is the best way to channelize the data into new marketing strategies that benefit entrepreneurs as well as customers. Therefore an automated system which can analyze the customer behavior is in great demand. Users may write feedbacks in any language, and hence mining appropriate information often becomes intractable. Especially in a traditional feature-based supervised model, it is difficult to build a generic system as one has to understand the concerned language for finding the relevant features. In order to overcome this, we propose deep Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) based approaches that do not require handcrafting of features. We evaluate these techniques for analyzing customer feedback sentences on four languages, namely English, French, Japanese and Spanish. Our empirical analysis shows that our models perform well in all the four languages on the setups of IJCNLP Shared Task on Customer Feedback Analysis. Our model achieved the second rank in French, with an accuracy of 71.75% and third ranks for all the other languages.

End-to-end Relation Extraction using Neural Networks and Markov Logic Networks
Sachin Pawar | Pushpak Bhattacharyya | Girish Palshikar
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

End-to-end relation extraction refers to identifying boundaries of entity mentions, entity types of these mentions and appropriate semantic relation for each pair of mentions. Traditionally, separate predictive models were trained for each of these tasks and were used in a “pipeline” fashion where output of one model is fed as input to another. But it was observed that addressing some of these tasks jointly results in better performance. We propose a single, joint neural network based model to carry out all the three tasks of boundary identification, entity type classification and relation type classification. This model is referred to as “All Word Pairs” model (AWP-NN) as it assigns an appropriate label to each word pair in a given sentence for performing end-to-end relation extraction. We also propose to refine output of the AWP-NN model by using inference in Markov Logic Networks (MLN) so that additional domain knowledge can be effectively incorporated. We demonstrate effectiveness of our approach by achieving better end-to-end relation extraction performance than all 4 previous joint modelling approaches, on the standard dataset of ACE 2004.

Temporality as Seen through Translation: A Case Study on Hindi Texts
Sabyasachi Kamila | Sukanta Sen | Mohammad Hasanuzzaman | Asif Ekbal | Andy Way | Pushpak Bhattacharyya
Proceedings of Machine Translation Summit XVI: Research Track

Learning variable length units for SMT between related languages via Byte Pair Encoding
Anoop Kunchukuttan | Pushpak Bhattacharyya
Proceedings of the First Workshop on Subword and Character Level Models in NLP

We explore the use of segments learnt using Byte Pair Encoding (referred to as BPE units) as basic units for statistical machine translation between related languages and compare it with orthographic syllables, which are currently the best performing basic units for this translation task. BPE identifies the most frequent character sequences as basic units, while orthographic syllables are linguistically motivated pseudo-syllables. We show that BPE units modestly outperform orthographic syllables as units of translation, showing up to 11% increase in BLEU score. While orthographic syllables can be used only for languages whose writing systems use vowel representations, BPE is writing system independent and we show that BPE outperforms other units for non-vowel writing systems too. Our results are supported by extensive experimentation spanning multiple language families and writing systems.

Learning Cognitive Features from Gaze Data for Sentiment and Sarcasm Classification using Convolutional Neural Network
Abhijit Mishra | Kuntal Dey | Pushpak Bhattacharyya
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Cognitive NLP systems- i.e., NLP systems that make use of behavioral data - augment traditional text-based features with cognitive features extracted from eye-movement patterns, EEG signals, brain-imaging etc. Such extraction of features is typically manual. We contend that manual extraction of features may not be the best way to tackle text subtleties that characteristically prevail in complex classification tasks like Sentiment Analysis and Sarcasm Detection, and that even the extraction and choice of features should be delegated to the learning system. We introduce a framework to automatically extract cognitive features from the eye-movement/gaze data of human readers reading the text and use them as features along with textual features for the tasks of sentiment polarity and sarcasm detection. Our proposed framework is based on Convolutional Neural Network (CNN). The CNN learns features from both gaze and text and uses them to classify the input text. We test our technique on published sentiment and sarcasm labeled datasets, enriched with gaze information, to show that using a combination of automatically learned text and gaze features often yields better classification performance over (i) CNN based systems that rely on text input alone and (ii) existing systems that rely on handcrafted gaze and textual features.

Hindi Shabdamitra: A Wordnet based E-Learning Tool for Language Learning and Teaching
Hanumant Redkar | Sandhya Singh | Dhara Gorasia | Meenakshi Somasundaram | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

IITP at EmoInt-2017: Measuring Intensity of Emotions using Sentence Embeddings and Optimized Features
Md Shad Akhtar | Palaash Sawant | Asif Ekbal | Jyoti Pawar | Pushpak Bhattacharyya
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

This paper describes the system that we submitted as part of our participation in the shared task on Emotion Intensity (EmoInt-2017). We propose a Long short term memory (LSTM) based architecture cascaded with Support Vector Regressor (SVR) for intensity prediction. We also employ Particle Swarm Optimization (PSO) based feature selection algorithm for obtaining an optimized feature set for training and evaluation. System evaluation shows interesting results on the four emotion datasets i.e. anger, fear, joy and sadness. In comparison to the other participating teams our system was ranked 5th in the competition.

Towards Lower Bounds on Number of Dimensions for Word Embeddings
Kevin Patel | Pushpak Bhattacharyya
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Word embeddings are a relatively new addition to the modern NLP researcher’s toolkit. However, unlike other tools, word embeddings are used in a black box manner. There are very few studies regarding various hyperparameters. One such hyperparameter is the dimension of word embeddings. They are rather decided based on a rule of thumb: in the range 50 to 300. In this paper, we show that the dimension should instead be chosen based on corpus statistics. More specifically, we show that the number of pairwise equidistant words of the corpus vocabulary (as defined by some distance/similarity metric) gives a lower bound on the the number of dimensions , and going below this bound results in degradation of quality of learned word embeddings. Through our evaluations on standard word embedding evaluation tasks, we show that for dimensions higher than or equal to the bound, we get better results as compared to the ones below it.

Utilizing Lexical Similarity between Related, Low-resource Languages for Pivot-based SMT
Anoop Kunchukuttan | Maulik Shah | Pradyot Prakash | Pushpak Bhattacharyya
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

We investigate pivot-based translation between related languages in a low resource, phrase-based SMT setting. We show that a subword-level pivot-based SMT model using a related pivot language is substantially better than word and morpheme-level pivot models. It is also highly competitive with the best direct translation model, which is encouraging as no direct source-target training corpus is used. We also show that combining multiple related language pivot models can rival a direct translation model. Thus, the use of subwords as translation units coupled with multiple related pivot languages can compensate for the lack of a direct parallel corpus.

IITP at SemEval-2017 Task 5: An Ensemble of Deep Learning and Feature Based Models for Financial Sentiment Analysis
Deepanway Ghosal | Shobhit Bhatnagar | Md Shad Akhtar | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

In this paper we propose an ensemble based model which combines state of the art deep learning sentiment analysis algorithms like Convolution Neural Network (CNN) and Long Short Term Memory (LSTM) along with feature based models to identify optimistic or pessimistic sentiments associated with companies and stocks in financial texts. We build our system to participate in a competition organized by Semantic Evaluation 2017 International Workshop. We combined predictions from various models using an artificial neural network to determine the opinion towards an entity in (a) Microblog Messages and (b) News Headlines data. Our models achieved a cosine similarity score of 0.751 and 0.697 for the above two tracks giving us the rank of 2nd and 7th best team respectively.

Computational Sarcasm
Pushpak Bhattacharyya | Aditya Joshi
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

Sarcasm is a form of verbal irony that is intended to express contempt or ridicule. Motivated by challenges posed by sarcastic text to sentiment analysis, computational approaches to sarcasm have witnessed a growing interest at NLP forums in the past decade. Computational sarcasm refers to automatic approaches pertaining to sarcasm. The tutorial will provide a bird’s-eye view of the research in computational sarcasm for text, while focusing on significant milestones.The tutorial begins with linguistic theories of sarcasm, with a focus on incongruity: a useful notion that underlies sarcasm and other forms of figurative language. Since the most significant work in computational sarcasm is sarcasm detection: predicting whether a given piece of text is sarcastic or not, sarcasm detection forms the focus hereafter. We begin our discussion on sarcasm detection with datasets, touching on strategies, challenges and nature of datasets. Then, we describe algorithms for sarcasm detection: rule-based (where a specific evidence of sarcasm is utilised as a rule), statistical classifier-based (where features are designed for a statistical classifier), a topic model-based technique, and deep learning-based algorithms for sarcasm detection. In case of each of these algorithms, we refer to our work on sarcasm detection and share our learnings. Since information beyond the text to be classified, contextual information is useful for sarcasm detection, we then describe approaches that use such information through conversational context or author-specific context.We then follow it by novel areas in computational sarcasm such as sarcasm generation, sarcasm v/s irony classification, etc. We then summarise the tutorial and describe future directions based on errors reported in past work. The tutorial will end with a demonstration of our work on sarcasm detection.This tutorial will be of interest to researchers investigating computational sarcasm and related areas such as computational humour, figurative language understanding, emotion and sentiment sentiment analysis, etc. The tutorial is motivated by our continually evolving survey paper of sarcasm detection, that is available on arXiv at: Joshi, Aditya, Pushpak Bhattacharyya, and Mark James Carman. “Automatic Sarcasm Detection: A Survey.” arXiv preprint arXiv:1602.03426 (2016).

2016

Opinion Mining in a Code-Mixed Environment: A Case Study with Government Portals
Deepak Gupta | Ankit Lamba | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 13th International Conference on Natural Language Processing

Statistical Machine Translation between Related Languages
Pushpak Bhattacharyya | Mitesh M. Khapra | Anoop Kunchukuttan
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts

IndoWordNet::Similarity- Computing Semantic Similarity and Relatedness using IndoWordNet
Sudha Bhingardive | Hanumant Redkar | Prateek Sappadla | Dhirendra Singh | Pushpak Bhattacharyya
Proceedings of the 8th Global WordNet Conference (GWC)

Semantic similarity and relatedness measures play an important role in natural language processing applications. In this paper, we present the IndoWordNet::Similarity tool and interface, designed for computing the semantic similarity and relatedness between two words in IndoWordNet. A java based tool and a web interface have been developed to compute this semantic similarity and relatedness. Also, Java API has been developed for this purpose. This tool, web interface and the API are made available for the research purpose.

Are Word Embedding-based Features Useful for Sarcasm Detection?
Aditya Joshi | Vaibhav Tripathi | Kevin Patel | Pushpak Bhattacharyya | Mark Carman
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

Deep Learning Architecture for Patient Data De-identification in Clinical Records
Shweta Yadav | Asif Ekbal | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

Rapid growth in Electronic Medical Records (EMR) has emerged to an expansion of data in the clinical domain. The majority of the available health care information is sealed in the form of narrative documents which form the rich source of clinical information. Text mining of such clinical records has gained huge attention in various medical applications like treatment and decision making. However, medical records enclose patient Private Health Information (PHI) which can reveal the identities of the patients. In order to retain the privacy of patients, it is mandatory to remove all the PHI information prior to making it publicly available. The aim is to de-identify or encrypt the PHI from the patient medical records. In this paper, we propose an algorithm based on deep learning architecture to solve this problem. We perform de-identification of seven PHI terms from the clinical records. Experiments on benchmark datasets show that our proposed approach achieves encouraging performance, which is better than the baseline model developed with Conditional Random Field.

IITP English-Hindi Machine Translation System at WAT 2016
Sukanta Sen | Debajyoty Banik | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

In this paper we describe the system that we develop as part of our participation in WAT 2016. We develop a system based on hierarchical phrase-based SMT for English to Hindi language pair. We perform re-ordering and augment bilingual dictionary to improve the performance. As a baseline we use a phrase-based SMT model. The MT models are fine-tuned on the development set, and the best configurations are used to report the evaluation on the test set. Experiments show the BLEU of 13.71 on the benchmark test data. This is better compared to the official baseline BLEU score of 10.79.

Improving Document Ranking using Query Expansion and Classification Techniques for Mixed Script Information Retrieval
Subham Kumar | Anwesh Sinha Ray | Sabyasachi Kamila | Asif Ekbal | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 13th International Conference on Natural Language Processing

A Hybrid Deep Learning Architecture for Sentiment Analysis
Md Shad Akhtar | Ayush Kumar | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this paper, we propose a novel hybrid deep learning archtecture which is highly efficient for sentiment analysis in resource-poor languages. We learn sentiment embedded vectors from the Convolutional Neural Network (CNN). These are augmented to a set of optimized features selected through a multi-objective optimization (MOO) framework. The sentiment augmented optimized vector obtained at the end is used for the training of SVM for sentiment classification. We evaluate our proposed approach for coarse-grained (i.e. sentence level) as well as fine-grained (i.e. aspect level) sentiment analysis on four Hindi datasets covering varying domains. In order to show that our proposed method is generic in nature we also evaluate it on two benchmark English datasets. Evaluation shows that the results of the proposed method are consistent across all the datasets and often outperforms the state-of-art systems. To the best of our knowledge, this is the very first attempt where such a deep learning model is used for less-resourced languages such as Hindi.

High, Medium or Low? Detecting Intensity Variation Among polar synonyms in WordNet
Raksha Sharma | Pushpak Bhattacharyya
Proceedings of the 8th Global WordNet Conference (GWC)

For fine-grained sentiment analysis, we need to go beyond zero-one polarity and find a way to compare adjectives (synonyms) that share the same sense. Choice of a word from a set of synonyms, provides a way to select the exact polarity-intensity. For example, choosing to describe a person as benevolent rather than kind1 changes the intensity of the expression. In this paper, we present a sense based lexical resource, where synonyms are assigned intensity levels, viz., high, medium and low. We show that the measure P (s|w) (probability of a sense s given the word w) can derive the intensity of a word within the sense. We observe a statistically significant positive correlation between P(s|w) and intensity of synonyms for three languages, viz., English, Marathi and Hindi. The average correlation scores are 0.47 for English, 0.56 for Marathi and 0.58 for Hindi.

Leveraging Annotators’ Gaze Behaviour for Coreference Resolution
Joe Cheri | Abhijit Mishra | Pushpak Bhattacharyya
Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning

Verbframator:Semi-Automatic Verb Frame Annotator Tool with Special Reference to Marathi
Hanumant Redkar | Sandhya Singh | Nandini Ghag | Jai Paranjape | Nilesh Joshi | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the 13th International Conference on Natural Language Processing

Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)
Dekai Wu | Pushpak Bhattacharyya
Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)

Multiword Expressions Dataset for Indian Languages
Dhirendra Singh | Sudha Bhingardive | Pushpak Bhattacharyya
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Multiword Expressions (MWEs) are used frequently in natural languages, but understanding the diversity in MWEs is one of the open problem in the area of Natural Language Processing. In the context of Indian languages, MWEs play an important role. In this paper, we present MWEs annotation dataset created for Indian languages viz., Hindi and Marathi. We extract possible MWE candidates using two repositories: 1) the POS-tagged corpus and 2) the IndoWordNet synsets. Annotation is done for two types of MWEs: compound nouns and light verb constructions. In the process of annotation, human annotators tag valid MWEs from these candidates based on the standard guidelines provided to them. We obtained 3178 compound nouns and 2556 light verb constructions in Hindi and 1003 compound nouns and 2416 light verb constructions in Marathi using two repositories mentioned before. This created resource is made available publicly and can be used as a gold standard for Hindi and Marathi MWE systems.

A Recurrent Neural Network Architecture for De-identifying Clinical Records
Shweta | Ankit Kumar | Asif Ekbal | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 13th International Conference on Natural Language Processing

Synset Ranking of Hindi WordNet
Sudha Bhingardive | Rajita Shukla | Jaya Saraswati | Laxmi Kashyap | Dhirendra Singh | Pushpak Bhattacharyya
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Word Sense Disambiguation (WSD) is one of the open problems in the area of natural language processing. Various supervised, unsupervised and knowledge based approaches have been proposed for automatically determining the sense of a word in a particular context. It has been observed that such approaches often find it difficult to beat the WordNet First Sense (WFS) baseline which assigns the sense irrespective of context. In this paper, we present our work on creating the WFS baseline for Hindi language by manually ranking the synsets of Hindi WordNet. A ranking tool is developed where human experts can see the frequency of the word senses in the sense-tagged corpora and have been asked to rank the senses of a word by using this information and also his/her intuition. The accuracy of WFS baseline is tested on several standard datasets. F-score is found to be 60%, 65% and 55% on Health, Tourism and News datasets respectively. The created rankings can also be used in other NLP applications viz., Machine Translation, Information Retrieval, Text Summarization, etc.

SlangNet: A WordNet like resource for English Slang
Shehzaad Dhuliawala | Diptesh Kanojia | Pushpak Bhattacharyya
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a WordNet like structured resource for slang words and neologisms on the internet. The dynamism of language is often an indication that current language technology tools trained on today’s data, may not be able to process the language in the future. Our resource could be (1) used to augment the WordNet, (2) used in several Natural Language Processing (NLP) applications which make use of noisy data on the internet like Information Retrieval and Web Mining. Such a resource can also be used to distinguish slang word senses from conventional word senses. To stimulate similar innovations widely in the NLP community, we test the efficacy of our resource for detecting slang using standard bag of words Word Sense Disambiguation (WSD) algorithms (Lesk and Extended Lesk) for English data on the internet.

How Challenging is Sarcasm versus Irony Classification?: A Study With a Dataset from English Literature
Aditya Joshi | Vaibhav Tripathi | Pushpak Bhattacharyya | Mark Carman | Meghna Singh | Jaya Saraswati | Rajita Shukla
Proceedings of the Australasian Language Technology Association Workshop 2016

Meaning Matters: Senses of Words are More Informative than Words for Cross-domain Sentiment Analysis
Raksha Sharma | Sudha Bhingardive | Pushpak Bhattacharyya
Proceedings of the 13th International Conference on Natural Language Processing

Harnessing Sequence Labeling for Sarcasm Detection in Dialogue from TV Series ‘Friends’
Aditya Joshi | Vaibhav Tripathi | Pushpak Bhattacharyya | Mark J. Carman
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

Can SMT and RBMT Improve each other’s Performance?- An Experiment with English-Hindi Translation
Debajyoty Banik | Sukanta Sen | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 13th International Conference on Natural Language Processing

IndoWordNet Conversion to Web Ontology Language (OWL)
Apurva Nagvenkar | Jyoti Pawar | Pushpak Bhattacharyya
Proceedings of the 8th Global WordNet Conference (GWC)

WordNet plays a significant role in Linked Open Data (LOD) cloud. It has numerous application ranging from ontology annotation to ontology mapping. IndoWordNet is a linked WordNet connecting 18 Indian language WordNets with Hindi as a source WordNet. The Hindi WordNet was initially developed by linking it to English WordNet. In this paper, we present a data representation of IndoWordNet in Web Ontology Language (OWL). The schema of Princeton WordNet has been enhanced to support the representation of IndoWordNet. This IndoWordNet representation in OWL format is now available to link other web resources. This representation is implemented for eight Indian languages.

Substring-based unsupervised transliteration with phonetic and contextual knowledge
Anoop Kunchukuttan | Pushpak Bhattacharyya | Mitesh M. Khapra
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

That’ll Do Fine!: A Coarse Lexical Resource for English-Hindi MT, Using Polylingual Topic Models
Diptesh Kanojia | Aditya Joshi | Pushpak Bhattacharyya | Mark James Carman
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Parallel corpora are often injected with bilingual lexical resources for improved Indian language machine translation (MT). In absence of such lexical resources, multilingual topic models have been used to create coarse lexical resources in the past, using a Cartesian product approach. Our results show that for morphologically rich languages like Hindi, the Cartesian product approach is detrimental for MT. We then present a novel ‘sentential’ approach to use this coarse lexical resource from a multilingual topic model. Our coarse lexical resource when injected with a parallel corpus outperforms a system trained using parallel corpus and a good quality lexical resource. As demonstrated by the quality of our coarse lexical resource and its benefit to MT, we believe that our sentential approach to create such a resource will help MT for resource-constrained languages.

Detecting Most Frequent Sense using Word Embeddings and BabelNet
Harpreet Singh Arora | Sudha Bhingardive | Pushpak Bhattacharyya
Proceedings of the 8th Global WordNet Conference (GWC)

Since the inception of the SENSEVAL evaluation exercises there has been a great deal of recent research into Word Sense Disambiguation (WSD). Over the years, various supervised, unsupervised and knowledge based WSD systems have been proposed. Beating the first sense heuristics is a challenging task for these systems. In this paper, we present our work on Most Frequent Sense (MFS) detection using Word Embeddings and BabelNet features. The semantic features from BabelNet viz., synsets, gloss, relations, etc. are used for generating sense embeddings. We compare word embedding of a word with its sense embeddings to obtain the MFS with the highest similarity. The MFS is detected for six languages viz., English, Spanish, Russian, German, French and Italian. However, this approach can be applied to any language provided that word embeddings are available for that language.

Faster Decoding for Subword Level Phrase-based SMT between Related Languages
Anoop Kunchukuttan | Pushpak Bhattacharyya
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

A common and effective way to train translation systems between related languages is to consider sub-word level basic units. However, this increases the length of the sentences resulting in increased decoding time. The increase in length is also impacted by the specific choice of data format for representing the sentences as subwords. In a phrase-based SMT framework, we investigate different choices of decoder parameters as well as data format and their impact on decoding time and translation accuracy. We suggest best options for these settings that significantly improve decoding time with little impact on the translation accuracy.

Mapping it differently: A solution to the linking challenges
Meghna Singh | Rajita Shukla | Jaya Saraswati | Laxmi Kashyap | Diptesh Kanojia | Pushpak Bhattacharyya
Proceedings of the 8th Global WordNet Conference (GWC)

This paper reports the work of creating bilingual mappings in English for certain synsets of Hindi wordnet, the need for doing this, the methods adopted and the tools created for the task. Hindi wordnet, which forms the foundation for other Indian language wordnets, has been linked to the English WordNet. To maximize linkages, an important strategy of using direct and hypernymy linkages has been followed. However, the hypernymy linkages were found to be inadequate in certain cases and posed a challenge due to sense granularity of language. Thus, the idea of creating bilingual mappings was adopted as a solution. A bilingual mapping means a linkage between a concept in two different languages, with the help of translation and/or transliteration. Such mappings retain meaningful representations, while capturing semantic similarity at the same time. This has also proven to be a great enhancement of Hindi wordnet and can be a crucial resource for multilingual applications in natural language processing, including machine translation and cross language information retrieval.

How Do Cultural Differences Impact the Quality of Sarcasm Annotation?: A Case Study of Indian Annotators and American Text
Aditya Joshi | Pushpak Bhattacharyya | Mark Carman | Jaya Saraswati | Rajita Shukla
Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

Orthographic Syllable as basic unit for SMT between Related Languages
Anoop Kunchukuttan | Pushpak Bhattacharyya
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

A picture is worth a thousand words: Using OpenClipArt library for enriching IndoWordNet
Diptesh Kanojia | Shehzaad Dhuliawala | Pushpak Bhattacharyya
Proceedings of the 8th Global WordNet Conference (GWC)

WordNet has proved to be immensely useful for Word Sense Disambiguation, and thence Machine translation, Information Retrieval and Question Answering. It can also be used as a dictionary for educational purposes. The semantic nature of concepts in a WordNet motivates one to try to express this meaning in a more visual way. In this paper, we describe our work of enriching IndoWordNet with image acquisitions from the OpenClipArt library. We describe an approach used to enrich WordNets for eighteen Indian languages. Our contribution is three fold: (1) We develop a system, which, given a synset in English, finds an appropriate image for the synset. The system uses the OpenclipArt library (OCAL) to retrieve images and ranks them. (2) After retrieving the images, we map the results along with the linkages between Princeton WordNet and Hindi WordNet, to link several synsets to corresponding images. We choose and sort top three images based on our ranking heuristic per synset. (3) We develop a tool that allows a lexicographer to manually evaluate these images. The top images are shown to a lexicographer by the evaluation tool for the task of choosing the best image representation. The lexicographer also selects the number of relevant images. Using our system, we obtain an Average Precision (P @ 3) score of 0.30.

Leveraging Cognitive Features for Sentiment Analysis
Abhijit Mishra | Diptesh Kanojia | Seema Nagar | Kuntal Dey | Pushpak Bhattacharyya
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

‘Who would have thought of that!’: A Hierarchical Topic Model for Extraction of Sarcasm-prevalent Topics and Sarcasm Detection
Aditya Joshi | Prayas Jain | Pushpak Bhattacharyya | Mark Carman
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics (ExProM)

Topic Models have been reported to be beneficial for aspect-based sentiment analysis. This paper reports the first topic model for sarcasm detection, to the best of our knowledge. Designed on the basis of the intuition that sarcastic tweets are likely to have a mixture of words of both sentiments as against tweets with literal sentiment (either positive or negative), our hierarchical topic model discovers sarcasm-prevalent topics and topic-level sentiment. Using a dataset of tweets labeled using hashtags, the model estimates topic-level, and sentiment-level distributions. Our evaluation shows that topics such as ‘work’, ‘gun laws’, ‘weather’ are sarcasm-prevalent topics. Our model is also able to discover the mixture of sentiment-bearing words that exist in a text of a given sentiment-related label. Finally, we apply our model to predict sarcasm in tweets. We outperform two prior work based on statistical classifiers with specific features, by around 25%.

Aspect based Sentiment Analysis in Hindi: Resource Creation and Evaluation
Md Shad Akhtar | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Due to the phenomenal growth of online product reviews, sentiment analysis (SA) has gained huge attention, for example, by online service providers. A number of benchmark datasets for a wide range of domains have been made available for sentiment analysis, especially in resource-rich languages. In this paper we assess the challenges of SA in Hindi by providing a benchmark setup, where we create an annotated dataset of high quality, build machine learning models for sentiment analysis in order to show the effective usage of the dataset, and finally make the resource available to the community for further advancement of research. The dataset comprises of Hindi product reviews crawled from various online sources. Each sentence of the review is annotated with aspect term and its associated sentiment. As classification algorithms we use Conditional Random Filed (CRF) and Support Vector Machine (SVM) for aspect term extraction and sentiment analysis, respectively. Evaluation results show the average F-measure of 41.07% for aspect term extraction and accuracy of 54.05% for sentiment classification.

Harnessing Cognitive Features for Sarcasm Detection
Abhijit Mishra | Diptesh Kanojia | Seema Nagar | Kuntal Dey | Pushpak Bhattacharyya
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
Toshiaki Nakazawa | Hideya Mino | Chenchen Ding | Isao Goto | Graham Neubig | Sadao Kurohashi | Ir. Hammam Riza | Pushpak Bhattacharyya
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

Borrow a Little from your Rich Cousin: Using Embeddings and Polarities of English Words for Multilingual Sentiment Classification
Prerana Singhal | Pushpak Bhattacharyya
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this paper, we provide a solution to multilingual sentiment classification using deep learning. Given input text in a language, we use word translation into English and then the embeddings of these English words to train a classifier. This projection into the English space plus word embeddings gives a simple and uniform framework for multilingual sentiment analysis. A novel idea is augmentation of the training data with polar words, appearing in these sentences, along with their polarities. This approach leads to a performance gain of 7-10% over traditional classifiers on many languages, irrespective of text genre, despite the scarcity of resources in most languages.

On Why Coarse Class Classification is Bottleneck in Noun Compound Interpretation
Girishkumar Ponkiya | Pushpak Bhattacharyya | Girish K. Palshikar
Proceedings of the 13th International Conference on Natural Language Processing

Samāsa-Kartā: An Online Tool for Producing Compound Words using IndoWordNet
Hanumant Redkar | Nilesh Joshi | Sandhya Singh | Irawati Kulkarni | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the 8th Global WordNet Conference (GWC)

Samāsa or compounds are a regular feature of Indian Languages. They are also found in other languages like German, Italian, French, Russian, Spanish, etc. Compound word is constructed from two or more words to form a single word. The meaning of this word is derived from each of the individual words of the compound. To develop a system to generate, identify and interpret compounds, is an important task in Natural Language Processing. This paper introduces a web based tool - Samāsa-Kartā for producing compound words. Here, the focus is on Sanskrit language due to its richness in usage of compounds; however, this approach can be applied to any Indian language as well as other languages. IndoWordNet is used as a resource for words to be compounded. The motivation behind creating compound words is to create, to improve the vocabulary, to reduce sense ambiguity, etc. in order to enrich the WordNet. The Samāsa-Kartā can be used for various applications viz., compound categorization, sandhi creation, morphological analysis, paraphrasing, synset creation, etc.

Lexical Resources to Enrich English Malayalam Machine Translation
Sreelekha S | Pushpak Bhattacharyya
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we present our work on the usage of lexical resources for the Machine Translation English and Malayalam. We describe a comparative performance between different Statistical Machine Translation (SMT) systems on top of phrase based SMT system as baseline. We explore different ways of utilizing lexical resources to improve the quality of English Malayalam statistical machine translation. In order to enrich the training corpus we have augmented the lexical resources in two ways (a) additional vocabulary and (b) inflected verbal forms. Lexical resources include IndoWordnet semantic relation set, lexical words and verb phrases etc. We have described case studies, evaluations and have given detailed error analysis for both Malayalam to English and English to Malayalam machine translation systems. We observed significant improvement in evaluations of translation quality. Lexical resources do help uplift performance when parallel corpora are scanty.

Political Issue Extraction Model: A Novel Hierarchical Topic Model That Uses Tweets By Political And Non-Political Authors
Aditya Joshi | Pushpak Bhattacharyya | Mark Carman
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

IIT Bombay’s English-Indonesian submission at WAT: Integrating Neural Language Models with SMT
Sandhya Singh | Anoop Kunchukuttan | Pushpak Bhattacharyya
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

This paper describes the IIT Bombay’s submission as a part of the shared task in WAT 2016 for English–Indonesian language pair. The results reported here are for both the direction of the language pair. Among the various approaches experimented, Operation Sequence Model (OSM) and Neural Language Model have been submitted for WAT. The OSM approach integrates translation and reordering process resulting in relatively improved translation. Similarly the neural experiment integrates Neural Language Model with Statistical Machine Translation (SMT) as a feature for translation. The Neural Probabilistic Language Model (NPLM) gave relatively high BLEU points for Indonesian to English translation system while the Neural Network Joint Model (NNJM) performed better for English to Indonesian direction of translation system. The results indicate improvement over the baseline Phrase-based SMT by 0.61 BLEU points for English-Indonesian system and 0.55 BLEU points for Indonesian-English translation system.

Sophisticated Lexical Databases - Simplified Usage: Mobile Applications and Browser Plugins For Wordnets
Diptesh Kanojia | Raj Dabre | Pushpak Bhattacharyya
Proceedings of the 8th Global WordNet Conference (GWC)

India is a country with 22 officially recognized languages and 17 of these have WordNets, a crucial resource. Web browser based interfaces are available for these WordNets, but are not suited for mobile devices which deters people from effectively using this resource. We present our initial work on developing mobile applications and browser extensions to access WordNets for Indian Languages. Our contribution is two fold: (1) We develop mobile applications for the Android, iOS and Windows Phone OS platforms for Hindi, Marathi and Sanskrit WordNets which allow users to search for words and obtain more information along with their translations in English and other Indian languages. (2) We also develop browser extensions for English, Hindi, Marathi, and Sanskrit WordNets, for both Mozilla Firefox, and Google Chrome. We believe that such applications can be quite helpful in a classroom scenario, where students would be able to access the WordNets as dictionaries as well as lexical knowledge bases. This can help in overcoming the language barrier along with furthering language understanding.

2015

Noun Phrase Chunking for Marathi using Distant Supervision
Sachin Pawar | Nitin Ramrakhiyani | Girish K. Palshikar | Pushpak Bhattacharyya | Swapnil Hingmire
Proceedings of the 12th International Conference on Natural Language Processing

Logistic Regression for Automatic Lexical Level Morphological Paradigm Selection for Konkani Nouns
Shilpa Desai | Jyoti Pawar | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing

Investigating the potential of post-ordering SMT output to improve translation quality
Pratik Mehta | Anoop Kunchukuttan | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing

Automated Analysis of Bangla Poetry for Classification and Poet Identification
Geetanjali Rakshit | Anupam Ghosh | Pushpak Bhattacharyya | Gholamreza Haffari
Proceedings of the 12th International Conference on Natural Language Processing

A temporal expression recognition system for medical documents by
Naman Gupta | Aditya Joshi | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing

Post-editing a chapter of a specialized textbook into 7 languages: importance of terminological proximity with English for productivity
Ritesh Shah | Christian Boitet | Pushpak Bhattacharyya | Mithun Padmakumar | Leonardo Zilio | Ruslan Kalitvianski | Mohammad Nasiruddin | Mutsuko Tomokiyo | Sandra Castellanos Páez
Proceedings of the 12th International Conference on Natural Language Processing

IndoWordNet Dictionary: An Online Multilingual Dictionary using IndoWordNet
Hanumant Redkar | Sandhya Singh | Nilesh Joshi | Anupam Ghosh | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing

Using Multilingual Topic Models for Improved Alignment in English-Hindi MT
Diptesh Kanojia | Aditya Joshi | Pushpak Bhattacharyya | Mark James Carman
Proceedings of the 12th International Conference on Natural Language Processing

Monotone Submodularity in Opinion Summaries
Jayanth Jayanth | Jayaprakash Sundararaj | Pushpak Bhattacharyya
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Solving Data Sparsity by Morphology Injection in Factored SMT
Sreelekha S | Piyush Dungarwal | Pushpak Bhattacharyya | Malathi D
Proceedings of the 12th International Conference on Natural Language Processing

Let Sense Bags Do Talking: Cross Lingual Word Semantic Similarity for English and Hindi
Apurva Nagvenkar | Jyoti Pawar | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing

Your Sentiment Precedes You: Using an author’s historical tweets to predict sarcasm
Anupam Khattri | Aditya Joshi | Pushpak Bhattacharyya | Mark Carman
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Brahmi-Net: A transliteration and script conversion system for languages of the Indian subcontinent
Anoop Kunchukuttan | Ratish Puduppully | Pushpak Bhattacharyya
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

Using Word Embeddings for Bilingual Unsupervised WSD
Sudha Bhingardive | Dhirendra Singh | Rudramurthy V | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing

Detection of Multiword Expressions for Hindi Language using Word Embeddings and WordNet-based Features
Dhirendra Singh | Sudha Bhingardive | Kevin Patel | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing

Leveraging Small Multilingual Corpora for SMT Using Many Pivot Languages
Raj Dabre | Fabien Cromieres | Sadao Kurohashi | Pushpak Bhattacharyya
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Unsupervised Most Frequent Sense Detection using Word Embeddings
Sudha Bhingardive | Dhirendra Singh | Rudramurthy V | Hanumant Redkar | Pushpak Bhattacharyya
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Judge a Book by its Cover: Conservative Focused Crawling under Resource Constraints
Shehzaad Dhuliawala | Arjun Atreya V | Ravi Kumar Yadav | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing

Adjective Intensity and Sentiment Analysis
Raksha Sharma | Mohit Gupta | Astha Agarwal | Pushpak Bhattacharyya
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Domain Sentiment Matters: A Two Stage Sentiment Analyzer
Raksha Sharma | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing

Harnessing Context Incongruity for Sarcasm Detection
Aditya Joshi | Vinita Sharma | Pushpak Bhattacharyya
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Coreference Resolution to Support IE from Indian Classical Music Forums
Joe Cheri | Pushpak Bhattacharyya
Proceedings of the International Conference Recent Advances in Natural Language Processing

Triangulation of Reordering Tables: An Advancement Over Phrase Table Triangulation in Pivot-Based SMT
Deepak Patil | Harshad Chavan | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing

TransChat: Cross-Lingual Instant Messaging for Indian Languages
Diptesh Kanojia | Shehzaad Dhuliawala | Abhijit Mishra | Naman Gupta | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing

A Computational Approach to Automatic Prediction of Drunk-Texting
Aditya Joshi | Abhijit Mishra | Balamurali AR | Pushpak Bhattacharyya | Mark J. Carman
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Data representation methods and use of mined corpora for Indian language transliteration
Anoop Kunchukuttan | Pushpak Bhattacharyya
Proceedings of the Fifth Named Entity Workshop

Addressing Class Imbalance in Grammatical Error Detection with Evaluation Metric Optimization
Anoop Kunchukuttan | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing

Augmenting Pivot based SMT with word segmentation
Rohit More | Anoop Kunchukuttan | Pushpak Bhattacharyya | Raj Dabre
Proceedings of the 12th International Conference on Natural Language Processing

2014

Supertag Based Pre-ordering in Machine Translation
Rajen Chatterjee | Anoop Kunchukuttan | Pushpak Bhattacharyya
Proceedings of the 11th International Conference on Natural Language Processing

Graph Based Algorithm for Automatic Domain Segmentation of WordNet
Brijesh Bhatt | Subhash Kunnath | Pushpak Bhattacharyya
Proceedings of the Seventh Global Wordnet Conference

AutoParSe: An Automatic Paradigm Selector For Nouns in Konkani
Shilpa Desai | Neenad Desai | Jyoti Pawar | Pushpak Bhattacharyya
Proceedings of the 11th International Conference on Natural Language Processing

Dive deeper: Deep Semantics for Sentiment Analysis
Nikhilkumar Jadhav | Pushpak Bhattacharyya
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Facilitating Multi-Lingual Sense Annotation: Human Mediated Lemmatizer
Pushpak Bhattacharyya | Ankit Bahuguna | Lavita Talukdar | Bornali Phukan
Proceedings of the Seventh Global Wordnet Conference

Anou Tradir: Experiences In Building Statistical Machine Translation Systems For Mauritian Languages – Creole, English, French
Raj Dabre | Aneerav Sukhoo | Pushpak Bhattacharyya
Proceedings of the 11th International Conference on Natural Language Processing

HinMA: Distributed Morphology based Hindi Morphological Analyzer
Ankit Bahuguna | Lavita Talukdar | Pushpak Bhattacharyya | Smriti Singh
Proceedings of the 11th International Conference on Natural Language Processing

Shata-Anuvadak: Tackling Multiway Translation of Indian Languages
Anoop Kunchukuttan | Abhijit Mishra | Rajen Chatterjee | Ritesh Shah | Pushpak Bhattacharyya
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present a compendium of 110 Statistical Machine Translation systems built from parallel corpora of 11 Indian languages belonging to both Indo-Aryan and Dravidian families. We analyze the relationship between translation accuracy and the language families involved. We feel that insights obtained from this analysis will provide guidelines for creating machine translation systems of specific Indian language pairs. We build phrase based systems and some extensions. Across multiple languages, we show improvements on the baseline phrase based systems using these extensions: (1) source side reordering for English-Indian language translation, and (2) transliteration of untranslated words for Indian language-Indian language translation. These enhancements harness shared characteristics of Indian languages. To stimulate similar innovation widely in the NLP community, we have made the trained models for these language pairs publicly available.

Measuring Sentiment Annotation Complexity of Text
Aditya Joshi | Abhijit Mishra | Nivvedan Senthamilselvan | Pushpak Bhattacharyya
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

A Framework for Learning Morphology using Suffix Association Matrix
Shilpa Desai | Jyoti Pawar | Pushpak Bhattacharyya
Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing

IndoWordnet Visualizer: A Graphical User Interface for Browsing and Exploring Wordnets of Indian Languages
Devendra Singh Chaplot | Sudha Bhingardive | Pushpak Bhattacharyya
Proceedings of the Seventh Global Wordnet Conference

PaCMan : Parallel Corpus Management Workbench
Diptesh Kanojia | Manish Shrivastava | Raj Dabre | Pushpak Bhattacharyya
Proceedings of the 11th International Conference on Natural Language Processing

Merging Verb Senses of Hindi WordNet using Word Embeddings
Sudha Bhingardive | Ratish Puduppully | Dhirendra Singh | Pushpak Bhattacharyya
Proceedings of the 11th International Conference on Natural Language Processing

Semi-Automatic Extension of Sanskrit Wordnet using Bilingual Dictionary
Sudha Bhingardive | Tanuja Ajotikar | Irawati Kulkarni | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the Seventh Global Wordnet Conference

A cognitive study of subjectivity extraction in sentiment annotation
Abhijit Mishra | Aditya Joshi | Pushpak Bhattacharyya
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Introduction to Synskarta: An Online Interface for Synset Creation with Special Reference to Sanskrit
Hanumant Redkar | Jai Paranjape | Nilesh Joshi | Irawati Kulkarni | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the 11th International Conference on Natural Language Processing

LAYERED: Metric for Machine Translation Evaluation
Shubham Gautam | Pushpak Bhattacharyya
Proceedings of the Ninth Workshop on Statistical Machine Translation

Tackling Close Cousins: Experiences In Developing Statistical Machine Translation Systems For Marathi And Hindi
Raj Dabre | Jyotesh Choudhari | Pushpak Bhattacharyya
Proceedings of the 11th International Conference on Natural Language Processing

Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing
Jorge Baptista | Pushpak Bhattacharyya | Christiane Fellbaum | Mikel Forcada | Chu-Ren Huang | Svetla Koeva | Cvetana Krstev | Eric Laporte
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing

Do not do processing, when you can look up: Towards a Discrimination Net for WSD
Diptesh Kanojia | Pushpak Bhattacharyya | Raj Dabre | Siddhartha Gunti | Manish Shrivastava
Proceedings of the Seventh Global Wordnet Conference

A Sentiment Analyzer for Hindi Using Hindi Senti Lexicon
Raksha Sharma | Pushpak Bhattacharyya
Proceedings of the 11th International Conference on Natural Language Processing

When Transliteration Met Crowdsourcing : An Empirical Study of Transliteration via Crowdsourcing using Efficient, Non-redundant and Fair Quality Control
Mitesh M. Khapra | Ananthakrishnan Ramanathan | Anoop Kunchukuttan | Karthik Visweswariah | Pushpak Bhattacharyya
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Sufficient parallel transliteration pairs are needed for training state of the art transliteration engines. Given the cost involved, it is often infeasible to collect such data using experts. Crowdsourcing could be a cheaper alternative, provided that a good quality control (QC) mechanism can be devised for this task. Most QC mechanisms employed in crowdsourcing are aggressive (unfair to workers) and expensive (unfair to requesters). In contrast, we propose a low-cost QC mechanism which is fair to both workers and requesters. At the heart of our approach, lies a rule based Transliteration Equivalence approach which takes as input a list of vowels in the two languages and a mapping of the consonants in the two languages. We empirically show that our approach outperforms other popular QC mechanisms (viz., consensus and sampling) on two vital parameters : (i) fairness to requesters (lower cost per correct transliteration) and (ii) fairness to workers (lower rate of rejecting correct answers). Further, as an extrinsic evaluation we use the standard NEWS 2010 test set and show that such quality controlled crowdsourced data compares well to expert data when used for training a transliteration engine.

The IIT Bombay Hindi-English Translation System at WMT 2014
Piyush Dungarwal | Rajen Chatterjee | Abhijit Mishra | Anoop Kunchukuttan | Ritesh Shah | Pushpak Bhattacharyya
Proceedings of the Ninth Workshop on Statistical Machine Translation

Tuning a Grammar Correction System for Increased Precision
Anoop Kunchukuttan | Sriram Chaudhury | Pushpak Bhattacharyya
Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task

2013

Automated Grammar Correction Using Hierarchical Phrase-Based Statistical Machine Translation
Bibek Behera | Pushpak Bhattacharyya
Proceedings of the Sixth International Joint Conference on Natural Language Processing

Detecting Turnarounds in Sentiment Analysis: Thwarting
Ankit Ramteke | Akshat Malu | Pushpak Bhattacharyya | J. Saketha Nath
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Little by Little: Semi Supervised Stemming through Stem Set Minimization
Vasudevan N | Pushpak Bhattacharyya
Proceedings of the Sixth International Joint Conference on Natural Language Processing

CFILT-CORE: Semantic Textual Similarity using Universal Networking Language
Avishek Dan | Pushpak Bhattacharyya
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

Detecting Domain Dedicated Polar Words
Raksha Sharma | Pushpak Bhattacharyya
Proceedings of the Sixth International Joint Conference on Natural Language Processing

Neighbors Help: Bilingual Unsupervised WSD Using Context
Sudha Bhingardive | Samiulla Shaikh | Pushpak Bhattacharyya
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

IITB-Sentiment-Analysts: Participation in Sentiment Analysis in Twitter SemEval 2013 Task
Karan Chawla | Ankit Ramteke | Pushpak Bhattacharyya
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

Automatically Predicting Sentence Translation Difficulty
Abhijit Mishra | Pushpak Bhattacharyya | Michael Carl
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

IndoNet: A Multilingual Lexical Knowledge Network for Indian Languages
Brijesh Bhatt | Lahari Poddar | Pushpak Bhattacharyya
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis
Kashyap Popat | Balamurali A.R | Pushpak Bhattacharyya | Gholamreza Haffari
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Proceedings of the 11th Workshop on Asian Language Resources
Pushpak Bhattacharyya | Key-Sun Choi
Proceedings of the 11th Workshop on Asian Language Resources

Making Headlines in Hindi: Automatic English to Hindi News Headline Translation
Aditya Joshi | Kashyap Popat | Shubham Gautam | Pushpak Bhattacharyya
The Companion Volume of the Proceedings of IJCNLP 2013: System Demonstrations

Proceedings of the 4th Workshop on South and Southeast Asian Natural Language Processing
Pushpak Bhattacharyya | M. G. Abbas Malik
Proceedings of the 4th Workshop on South and Southeast Asian Natural Language Processing

Structure Cognizant Pseudo Relevance Feedback
Arjun Atreya V | Yogesh Kakde | Pushpak Bhattacharyya | Ganesh Ramakrishnan
Proceedings of the Sixth International Joint Conference on Natural Language Processing

IITB System for CoNLL 2013 Shared Task: A Hybrid Approach to Grammatical Error Correction
Anoop Kunchukuttan | Ritesh Shah | Pushpak Bhattacharyya
Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task

More than meets the eye: Study of Human Cognition in Sense Annotation
Salil Joshi | Diptesh Kanojia | Pushpak Bhattacharyya
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Urdu Hindi Machine Transliteration using SMT
M. G. Abbas Malik | Christian Boitet | Laurent Besacier | Pushpak Bhattacharyya
Proceedings of the 4th Workshop on South and Southeast Asian Natural Language Processing

TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain
Anoop Kunchukuttan | Rajen Chatterjee | Shourya Roy | Abhijit Mishra | Pushpak Bhattacharyya
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2012

Textbook Construction from Lecture Transcripts
Aliabbas Petiwala | Kannan Moudgalya | Pushpak Bhattacharyya
Proceedings of the Workshop on Speech and Language Processing Tools in Education

I Can Sense It: a Comprehensive Online System for WSD
Salil Joshi | Mitesh M Khapra | Pushpak Bhattacharyya
Proceedings of COLING 2012: Demonstration Papers

Experiences in Resource Generation for Machine Translation through Crowdsourcing
Anoop Kunchukuttan | Shourya Roy | Pratik Patel | Kushal Ladha | Somya Gupta | Mitesh M. Khapra | Pushpak Bhattacharyya
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The logistics of collecting resources for Machine Translation (MT) has always been a cause of concern for some of the resource deprived languages of the world. The recent advent of crowdsourcing platforms provides an opportunity to explore the large scale generation of resources for MT. However, before venturing into this mode of resource collection, it is important to understand the various factors such as, task design, crowd motivation, quality control, etc. which can influence the success of such a crowd sourcing venture. In this paper, we present our experiences based on a series of experiments performed. This is an attempt to provide a holistic view of the different facets of translation crowd sourcing and identifying key challenges which need to be addressed for building a practical crowdsourcing solution for MT.

Automated Paradigm Selection for FSA based Konkani Verb Morphological Analyzer
Shilpa Desai | Jyoti Pawar | Pushpak Bhattacharyya
Proceedings of COLING 2012: Demonstration Papers

Cross-Lingual Sentiment Analysis for Indian Languages using Linked WordNets
Balamurali A.R. | Aditya Joshi | Pushpak Bhattacharyya
Proceedings of COLING 2012: Posters

Cost and Benefit of Using WordNet Senses for Sentiment Analysis
Balamurali AR | Aditya Joshi | Pushpak Bhattacharyya
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Typically, accuracy is used to represent the performance of an NLP system. However, accuracy attainment is a function of investment in annotation. Typically, the more the amount and sophistication of annotation, higher is the accuracy. However, a moot question is """"is the accuracy improvement commensurate with the cost incurred in annotation""""? We present an economic model to assess the marginal benefit accruing from increase in cost of annotation. In particular, as a case in point we have chosen the sentiment analysis (SA) problem. In SA, documents normally are polarity classified by running them through classifiers trained on document vectors constructed from lexeme features, i.e., words. If, however, instead of words, one uses word senses (synset ids in wordnets) as features, the accuracy improves dramatically. But is this improvement significant enough to justify the cost of annotation? This question, to the best of our knowledge, has not been investigated with the seriousness it deserves. We perform a cost benefit study based on a vendor-machine model. By setting up a cost price, selling price and profit scenario, we show that although extra cost is incurred in sense annotation, the profit margin is high, justifying the cost.

Domain Specific Ontology Extractor For Indian Languages
Brijesh Bhatt | Pushpak Bhattacharyya
Proceedings of the 10th Workshop on Asian Language Resources

Morphological Analyzer for Affix Stacking Languages: A Case Study of Marathi
Raj Dabre | Archana Amberkar | Pushpak Bhattacharyya
Proceedings of COLING 2012: Posters

Towards Efficient Named-Entity Rule Induction for Customizability
Ajay Nagesh | Ganesh Ramakrishnan | Laura Chiticariu | Rajasekar Krishnamurthy | Ankush Dharkar | Pushpak Bhattacharyya
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

janardhan: Semantic Textual Similarity using Universal Networking Language graph matching
Janardhan Singh | Arindam Bhattacharya | Pushpak Bhattacharyya
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

YouCat: Weakly Supervised Youtube Video Categorization System from Meta Data & User Comments using WordNet & Wikipedia
Subhabrata Mukherjee | Pushpak Bhattacharyya
Proceedings of COLING 2012

Error tracking in search engine development
Swapnil Chaudhari | Arjun Atreya V | Pushpak Bhattacharyya | Ganesh Ramakrishnan
Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing

Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Subhabrata Mukherjee | Pushpak Bhattacharyya
Proceedings of COLING 2012

Proceedings of the First Workshop on Eye-tracking and Natural Language Processing
Michael Carl | Pushpak Bhattacharyya | Kamal Kumar Choudhary
Proceedings of the First Workshop on Eye-tracking and Natural Language Processing

Discrimination-Net for Hindi
Diptesh Kanojia | Arindam Chatterjee | Salil Joshi | Pushpak Bhattacharyya
Proceedings of COLING 2012: Demonstration Papers

A heuristic-based approach for systematic error correction of gaze data for reading
Abhijit Mishra | Michael Carl | Pushpak Bhattacharyya
Proceedings of the First Workshop on Eye-tracking and Natural Language Processing

Building Multilingual Search Index using open source framework
Arjun Atreya | Swapnil Chaudhari | Pushpak Bhattacharyya | Ganesh Ramakrishnan
Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing

Partially modelling word reordering as a sequence labelling problem
Anoop Kunchukuttan | Pushpak Bhattacharyya
Proceedings of the Workshop on Reordering for Statistical Machine Translation

Eating Your Own Cooking: Automatically Linking Wordnet Synsets of Two Languages
Salil Joshi | Arindam Chatterjee | Arun Karthikeyan Karra | Pushpak Bhattacharyya
Proceedings of COLING 2012: Demonstration Papers

Proceedings of the First International Workshop on Optimization Techniques for Human Language Technology
Pushpak Bhattacharyya | Asif Ekbal | Sriparna Saha | Mark Johnson | Diego Molla-Aliod | Mark Dras
Proceedings of the First International Workshop on Optimization Techniques for Human Language Technology

2011

Robust Sense-based Sentiment Classification
Balamurali AR | Aditya Joshi | Pushpak Bhattacharyya
Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011)

Clause-Based Reordering Constraints to Improve Statistical Machine Translation
Ananthakrishnan Ramanathan | Pushpak Bhattacharyya | Karthik Visweswariah | Kushal Ladha | Ankur Gandhe
Proceedings of 5th International Joint Conference on Natural Language Processing

C-Feel-It: A Sentiment Analyzer for Micro-blogs
Aditya Joshi | Balamurali AR | Pushpak Bhattacharyya | Rajat Mohanty
Proceedings of the ACL-HLT 2011 System Demonstrations

Hybrid Inflectional Stemmer and Rule-based Derivational Stemmer for Gujarati
Kartik Suba | Dipti Jiandani | Pushpak Bhattacharyya
Proceedings of the 2nd Workshop on South Southeast Asian Natural Language Processing (WSSANLP)

It Takes Two to Tango: A Bilingual Unsupervised Approach for Estimating Sense Distributions using Expectation Maximization
Mitesh M. Khapra | Salil Joshi | Pushpak Bhattacharyya
Proceedings of 5th International Joint Conference on Natural Language Processing

Together We Can: Bilingual Bootstrapping for WSD
Mitesh M. Khapra | Salil Joshi | Arindam Chatterjee | Pushpak Bhattacharyya
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Harnessing WordNet Senses for Supervised Sentiment Classification
Balamurali AR | Aditya Joshi | Pushpak Bhattacharyya
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

Verbs are where all the action lies: Experiences of Shallow Parsing of a Morphologically Rich Language
Harshada Gune | Mugdha Bapat | Mitesh M. Khapra | Pushpak Bhattacharyya
Coling 2010: Posters

A Paradigm-Based Finite State Morphological Analyzer for Marathi
Mugdha Bapat | Harshada Gune | Pushpak Bhattacharyya
Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing

Multilingual Pseudo-Relevance Feedback: Performance Study of Assisting Languages
Manoj Kumar Chinnakotla | Karthik Raman | Pushpak Bhattacharyya
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Think Globally, Apply Locally: Using Distributional Characteristics for Hindi Named Entity Identification
Shalini Gupta | Pushpak Bhattacharyya
Proceedings of the 2010 Named Entities Workshop

More Languages, More MAP?: A Study of Multiple Assisting Languages in Multilingual PRF
Vishal Vachhani | Manoj Chinnakotla | Mitesh Khapra | Pushpak Bhattacharyya
Proceedings of the 4th Workshop on Cross Lingual Information Access

Value for Money: Balancing Annotation Effort, Lexicon Building and Accuracy for Multilingual WSD
Mitesh Khapra | Saurabh Sohoney | Anup Kulkarni | Pushpak Bhattacharyya
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

IndoWordNet
Pushpak Bhattacharyya
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

India is a multilingual country where machine translation and cross lingual search are highly relevant problems. These problems require large resources- like wordnets and lexicons- of high quality and coverage. Wordnets are lexical structures composed of synsets and semantic relations. Synsets are sets of synonyms. They are linked by semantic relations like hypernymy (is-a), meronymy (part-of), troponymy (manner-of) etc. IndoWordnet is a linked structure of wordnets of major Indian languages from Indo-Aryan, Dravidian and Sino-Tibetan families. These wordnets have been created by following the expansion approach from Hindi wordnet which was made available free for research in 2006. Since then a number of Indian languages have been creating their wordnets. In this paper we discuss the methodology, coverage, important considerations and multifarious benefits of IndoWordnet. Case studies are provided for Marathi, Sanskrit, Bodo and Telugu, to bring out the basic methodology of and challenges involved in the expansion approach. The guidelines the lexicographers follow for wordnet construction are enumerated. The difference between IndoWordnet and EuroWordnet also is discussed.

CFILT: Resource Conscious Approaches for All-Words Domain Specific WSD
Anup Kulkarni | Mitesh Khapra | Saurabh Sohoney | Pushpak Bhattacharyya
Proceedings of the 5th International Workshop on Semantic Evaluation

Word Sense Disambiguation and IR
Pushpak Bhattacharyya
Proceedings of the 4th Workshop on Cross Lingual Information Access

Finite-state Scriptural Translation
M. G. Abbas Malik | Christian Boitet | Pushpak Bhattacharyya
Coling 2010: Posters

Weak Translation Problems – a case study of Scriptural Translation
Muhammad Ghulam Abbas Malik | Christian Boitet | Pushpak Bhattacharyya | Laurent Besacier
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

General purpose, high quality and fully automatic MT is believed to be impossible. We are interested in scriptural translation problems, which are weak sub-problems of the general problem of translation. We introduce the characteristics of the weak problems of translation and of the scriptural translation problems, describe different computational approaches (finite-state, statistical and hybrid) to solve these problems, and report our results on several combinations of Indo-Pak languages and writing systems.

Everybody loves a rich cousin: An empirical study of transliteration through bridge languages
Mitesh M. Khapra | A Kumaran | Pushpak Bhattacharyya
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision
Mitesh Khapra | Anup Kulkarni | Saurabh Sohoney | Pushpak Bhattacharyya
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Hybrid Stemmer for Gujarati
Pratikkumar Patel | Kashyap Popat | Pushpak Bhattacharyya
Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing

OWNS: Cross-lingual Word Sense Disambiguation Using Weighted Overlap Counts and Wordnet Based Similarity Measures
Lipta Mahapatra | Meera Mohan | Mitesh Khapra | Pushpak Bhattacharyya
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

Improving Transliteration Accuracy Using Word-Origin Detection and Lexicon Lookup
Mitesh Khapra | Pushpak Bhattacharyya
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

A Hybrid Model for Urdu Hindi Transliteration
Abbas Malik | Laurent Besacier | Christian Boitet | Pushpak Bhattacharyya
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

Projecting Parameters for Multilingual Word Sense Disambiguation
Mitesh M. Khapra | Sapan Shah | Piyush Kedia | Pushpak Bhattacharyya
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3)
Sivaji Bandyopadhyay | Pushpak Bhattacharyya | Vasudeva Varma | Sudeshna Sarkar | A Kumaran | Raghavendra Udupa
Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3)

Case markers and Morphology: Addressing the crux of the fluency problem in English-Hindi SMT
Ananthakrishnan Ramanathan | Hansraj Choudhary | Avishek Ghosh | Pushpak Bhattacharyya
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

Hindi and Marathi to English Cross Language Information Retrieval
Manoj Kumar Chinnakotla | Sagar Ranadive | Om P. Damani | Pushpak Bhattacharyya
Proceedings of the 2nd workshop on Cross Lingual Information Access (CLIA) Addressing the Information Need of Multilingual Societies

A Common Parts-of-Speech Tagset Framework for Indian Languages
Baskaran Sankaran | Kalika Bali | Monojit Choudhury | Tanmoy Bhattacharya | Pushpak Bhattacharyya | Girish Nath Jha | S. Rajendran | K. Saravanan | L. Sobha | K.V. Subbarao
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present a universal Parts-of-Speech (POS) tagset framework covering most of the Indian languages (ILs) following the hierarchical and decomposable tagset schema. In spite of significant number of speakers, there is no workable POS tagset and tagger for most ILs, which serve as fundamental building blocks for NLP research. Existing IL POS tagsets are often designed for a specific language; the few that have been designed for multiple languages cover only shallow linguistic features ignoring linguistic richness and the idiosyncrasies. The new framework that is proposed here addresses these deficiencies in an efficient and principled manner. We follow a hierarchical schema similar to that of EAGLES and this enables the framework to be flexible enough to capture rich features of a language/ language family, even while capturing the shared linguistic structures in a methodical way. The proposed common framework further facilitates the sharing and reusability of scarce resources in these languages and ensures cross-linguistic compatibility.

Hindi Compound Verbs and their Automatic Extraction
Debasri Chakrabarti | Hemang Mandalia | Ritwik Priya | Vaijayanthi Sarma | Pushpak Bhattacharyya
Coling 2008: Companion volume: Posters

Simple Syntactic and Morphological Processing Can Help English-Hindi Statistical Machine Translation
Ananthakrishnan Ramanathan | Jayprasad Hegde | Ritesh M. Shah | Pushpak Bhattacharyya | Sasikumar M.
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

Lexical Resources for Semantics Extraction
Rajat Mohanty | Pushpak Bhattacharyya
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we report our work on the creation of a number of lexical resources that are crucial for an interlingua based MT from English to other languages. These lexical resources are in the form of sub-categorization frames, verb knowledge bases and rule templates for establishing semantic relations and speech act like attributes. We have created these resources over a long period of time from Oxford Advanced Learners Dictionary (OALD) [1], VerbNet [2], Princeton WordNet 2.1 [3], LCS database [4], Penn Tree Bank [5], and XTAG lexicon [6]. On the challenging problem of generating interlingua from domain and structure unrestricted English sentences, we are able to demonstrate that the use of these lexical resources makes a difference in terms of accuracy figures.

Hindi Urdu Machine Transliteration using Finite-State Transducers
M G Abbas Malik | Christian Boitet | Pushpak Bhattacharyya
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

Designing a Common POS-Tagset Framework for Indian Languages
Sankaran Baskaran | Kalika Bali | Tanmoy Bhattacharya | Pushpak Bhattacharyya | Girish Nath Jha | Rajendran S | Saravanan K | Sobha L | Subbarao K V.
Proceedings of the 6th Workshop on Asian Language Resources

2007

Hindi generation from interlingua
Smriti Singh | Mrugank Dalal | Vishal Vachhani | Pushpak Bhattacharyya | Om P. Damani
Proceedings of Machine Translation Summit XI: Papers

2006

Morphological Richness Offsets Resource Demand – Experiences in Constructing a POS Tagger for Hindi
Smriti Singh | Kuhoo Gupta | Manish Shrivastava | Pushpak Bhattacharyya
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

2005

Semantically Relatable Sets: Building Blocks for Representing Semantics
Rajat Kumar Mohanty | Anupama Dutta | Pushpak Bhattacharyya
Proceedings of Machine Translation Summit X: Papers

2004

Generic Text Summarization Using WordNet
Kedar Bellare | Anish Das Sarma | Atish Das Sarma | Navneet Loiwal | Vaibhav Mehta | Ganesh Ramakrishnan | Pushpak Bhattacharyya
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

A gloss-centered algorithm for disambiguation
Ganesh Ramakrishnan | B. Prithviraj | Pushpak Bhattacharyya
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

2003

Question Answering via Bayesian Inference on Lexical Relations
Ganesh Ramakrishnan | Apurva Jadhav | Ashutosh Joshi | Soumen Chakrabarti | Pushpak Bhattacharyya
Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering

Co-authors

Abhijit Mishra 19

Malhar Kulkarni 18

Mitesh M. Khapra 17

Rudra Murthy 13

Md. Shad Akhtar 12

Sudha Bhingardive 12

Preethi Jyothi 12

Girish Palshikar 12

Deepak Gupta 11

Hanumant Redkar 11

Sandhya Singh 11

Sourabh Deoghare 10

Mauajama Firdaus 8

Sandeep Mathias 8

Raksha Sharma 8

Balamurali AR 7

Rajen Chatterjee 7

Jyotsana Khatri 7

Ganesh Ramakrishnan 7

Dhirendra Singh 7

Tamali Banerjee 6

Christian Boitet 6

Tirthankar Ghosal 6

Soumitra Ghosh 6

Kamal Kumar Gupta 6

Gholamreza Haffari 6

Swapnil Hingmire 6

Suman Banerjee 5

Biplab Banerjee 5

Dushyant Singh Chauhan 5

Muthusamy Chelliah 5

Nikesh Garera 5

Harshvivek Kashid 5

Kishan Maharaj 5

Shivam Mhaskar 5

Sangameshwar Patil 5

Girishkumar Ponkiya 5

Nitin Ramrakhiyani 5

Jaya Saraswati 5

Rajita Shukla 5

Arjun Atreya V 5

Vasudeva Varma 5

Ramakrishna Appicharla 4

Shehzaad Dhuliawala 4

Deepanway Ghosal 4

Sabyasachi Kamila 4

Sadao Kurohashi 4

M. G. Abbas Malik 4

Santosh Kumar Mishra 4

Ananthakrishnan Ramanathan 4

Sreedhar Reddy 4

Ashita Saxena 4

Tejpalsingh Siledar 4

Gopendra Vikram Singh 4

Ankush Agarwal 3

Spandan Anaokar 3

Debajyoty Banik 3

Laurent Besacier 3

Brijesh Bhatt 3

Saprativa Bhattacharjee 3

Swapnil Bhattacharyya 3

Chris Biemann 3

Tanmoy Chakraborty 3

Arindam Chatterjee 3

Hardik Chauhan 3

Manoj Chinnakotla 3

Chenchen Ding 3

Himanshu Dutta 3

Markus Freitag 3

Shrey Ganatra 3

Rahul Hemrajani 3

Laxmi Kashyap 3

Anup Kulkarni 3

Irawati Kulkarni 3

Abhishek Kumar 3

Rustom Lawyer 3

Pabitra Lenka 3

Anutosh Maitra 3

Krishanu Maity 3

Siddharth Manohar 3

Toshiaki Nakazawa 3

Swaprava Nath 3

Sameer Pimparkhede 3

Kashyap Popat 3

Abisek Rajakumar Kalarani 3

Nihar Ranjan Sahoo 3

Sakriani Sakti 3

Kumar Saunack 3

Reshma Sekhar 3

Shubhashis Sengupta 3

Sumit Shekhar 3

Aditya Shetty 3

Manish Shrivastava 3

Sudhanshu Singh 3

Dhirendra Pratap Singh 3

Saurabh Sohoney 3

Settaluri Lakshmi Sravanthi 3

Shikha Srivastava 3

Srikanth G. Tamilselvam 3

Vaibhav Tripathi 3

Derek F. Wong (黄辉) 3

Pulkit Agarwal 2

Arif A. Ahmad 2

Amar Prakash Azad 2

Naveen Badathala 2

Ankit Bahuguna 2

Sivaji Bandyopadhyay 2

Aakash Banerjee 2

Akshay Batheja 2

Harsimran Bedi 2

Samarth Bharadwaj 2

Tanmoy Bhattacharya 2

Spriha Biswas 2

Frédéric Blain 2

Ondřej Bojar 2

Swapnil Chaudhari 2

Dushyant Chauhan 2

Niyati Chhaya 2

Gladvin Chinnadurai 2

Tejas Dhamecha 2

Abhijeet Dubey 2

Piyush Dungarwal 2

Vignesh Edithal 2

Prerak Gandhi 2

Shubham Gautam 2

Sakharam Gawade 2

Poulami Ghosh 2

Hitesh Golchha 2

Dhara Gorasia 2

Harshada Gune 2

Naman K. Gupta 2

Himanshu Gupta 2

Gholemreza Haffari 2

Rejwanul Haque 2

Kshitij Sharad Jadhav 2

Bhakti Jadhav 2

Girish Nath Jha 2

Mehant Kammakomati 2

Harshad Khadilkar 2

Manasi Kulkarni 2

Sanjeev Kumar 2

Lakshya Kumar 2

Niteesh Mallela 2

Hiroshi Manabe 2

Rajat Mohanty 2

Debjyoti Mondal 2

Sankara Muddu 2

Subhabrata Mukherjee 2

Apurva Nagvenkar 2

Shruthi N Nair 2

Hideki Nakayama 2

Apoorva Nunna 2

Constantin Orasan 2

Subhadarshi Panda 2

Ritesh Panjwani 2

Jai Paranjape 2

Shantipriya Parida 2

Pratikkumar Patel 2

Soujanya Poria 2

Kiran Pradeep 2

Ratish Puduppully 2

Roshni Ramnani 2

Ankit Ramteke 2

Tharindu Ranasinghe 2

Rupasai Rangaraju 2

Saichethan Reddy 2

Saumajit Saha 2

Sovan Kumar Sahoo 2

Baskaran Sankaran 2

Palaash Sawant 2

Bhavani Shankar 2

Ashutosh Sharma 2

Ujjwal Sharma 2

Dipti Misra Sharma 2

Kaustubh Shivshankar Shejole 2

Kush Shrivastava 2

Rituraj Singh 2

Meenakshi Somasundaram 2

Settaluri Sravanthi 2

Katsuhito Sudoh 2

Chanchal Suman 2

Lavita Talukdar 2

Pavan Tankala 2

Abhisek Tiwari 2

Subbarao K. V 2

Rudramurthy V 2

Rudra V Murthy 2

Vishal Vachhani 2

Deeksha Varshney 2

Karthik Visweswariah 2

Aakash Kumar Agarwal 1

Astha Agarwal 1

Samarth Agrawal 1

Tanuja Ajotikar 1

Ravi Tej Akella 1

Md. Tousin Akhter 1

Archana Amberkar 1

Harpreet Singh Arora 1

Giuseppe Attanasio 1

Prakhar Bapat 1

Jorge Baptista 1

Kingshuk Basak 1

Kedar Bellare 1

Shreyangshu Bera 1

Shobhit Bhatnagar 1

Himanshu Sharad Bhatt 1

Krishnanjan Bhattacharjee 1

Pallab Bhattacharjee 1

Arindam Bhattacharya 1

Nilesh Birari 1

Tameesh Biswas 1

Dhanvanth Boppana 1

José G. C. de Souza 1

Debasri Chakrabarti 1

Soumen Chakrabarti 1

Sachin Channabasavarajendra 1

Devendra Singh Chaplot 1

Rhugved Pankaj Chaudhari 1

Sriram Chaudhury 1

Harshad Chavan 1

Laura Chiticariu 1

Srinivasa Satya Sameer Kumar Chivukula 1

Jyotesh Choudhari 1

Kamal Kumar Choudhary 1

Paramveer Choudhary 1

Hansraj Choudhary 1

Monojit Choudhury 1

Fabien Cromieres 1

Mrugank Dalal 1

Sandipan Dandapat 1

Hemant Darbari 1

Kirushikesh Db 1

Shubham Dewangan 1

Minakshi Dhar 1

Ankush Dharkar 1

Anupama Dutta 1

Akiko Eriguchi 1

Christiane Fellbaum 1

Mikel L. Forcada 1

Pranav Gaikwad 1

Jayashree Aanand Gajjam 1

Aparna Garimella 1

Sayali Ghodekar 1

Avishek Ghosh 1

Muhammad Ghulam Abbas Malik 1

Mansi Golakiya 1

Nuno M. Guerreiro 1

Sravani Gunnu 1

Siddhartha Gunti 1

Pranjal Gupta 1

Shalini Gupta 1

Meva Ram Gurjar 1

Mohammed Hasanuzzaman 1

Mohammad Hasanuzzaman 1

Jayprasad Hegde 1

Shohei Higashiyama 1

Chu-Ren Huang 1

Jay J. Gorakhiya 1

Jemima S. Jacob 1

Nikhilkumar Jadhav 1

Apurva Jadhav 1

Anubhav Jangra 1

B JayaPrakash 1

Jayanth Jayanth 1

Manas Jhalani 1

Dipti Jiandani 1

Ashutosh Joshi 1

Ruslan Kalitvianski 1

Anuradha Kanamarlapudi 1

Chandresh Kanani 1

Shruti Kanitkar 1

Satyanarayan Kar 1

Durgaprasad Karnam 1

Arun Karthikeyan Karra 1

Darsh Kaushik 1

Tanay Kayastha 1

Anas Anwarul Haq Khan 1

Anupam Khattri 1

Palani Kodeswaran 1

Rajasekar Krishnamurthy 1

Cvetana Krstev 1

Pranamya Kulkarni 1

Chintalapalli Raja Kullayappa 1

Ashok Pon Kumar 1

Rajat Kumar Mohanty 1

Ponnurangam Kumaraguru 1

Surabhi Kumari 1

Harshavardhan Kundarapu 1

Manishit Kundu 1

Subhash Kunnath 1

Shreya Laddha 1

Sobha Lalitha Devi 1

Devansh Lalwani 1

Dhanashree Lele 1

Navneet Loiwal 1

Nishtha Madaan 1

Lipta Mahapatra 1

Sunny Manchanda 1

Konika Mandal 1

Hemang Mandalia 1

Jaithra Varma Manthena 1

André F. T. Martins 1

Dhirendra Kumar Maurya 1

Dhirendra Maurya 1

Sanket Vaibhav Mehta 1

Manthan Mehta 1

Sandra Milena Castellanos Páez 1

Ritwik Mishra 1

Shubham Mishra 1

Kshitij Mishra 1

Anirudh Mittal 1

Khyathi Gayathri Mothika 1

Kannan Moudgalya 1

Vitobha Munigala 1

Sharath Naganna 1

Sunny Narayan 1

Mukuntha Narayanan Sundararaman 1

Mohammad Nasiruddin 1

J. Saketha Nath 1

Graham Neubig 1

Pranav Jeevan P 1

Mithun Padmakumar 1

Aditya Prakash Patra 1

Siddhesh Pawar 1

Aliabbas Petiwala 1

Bornali Phukan 1

Shreyas Pimpalgaonkar 1

Lahari Poddar 1

A.S. Poornash 1

Pradyot Prakash 1

Vishal Pramanik 1

B. Prithviraj 1

Priyanshu Priya 1

Rajkumar Pujari 1

Geetanjali Rakshit 1

Karthik Raman 1

Sagar Ranadive 1

Geetanjali Rane 1

Godawari Sudhakar Rao 1

Mauli Rastogi 1

Anwesh Sinha Ray 1

Mandala Jagadeesh Reddy 1

Jake Renzella 1

Rajeev Sangal 1

Karthik Sankaranarayanan 1

Prateek Sappadla 1

Sudeshna Sarkar 1

Vaijayanthi M. Sarma 1

Anish Das Sarma 1

Atish Das Sarma 1

Sanand Sasidharan 1

Milind Savagaonkar 1

Sayandeep Sen 1

Nivvedan Senthamilselvan 1

Abhishek Sethi 1

Samiulla Shaikh 1

Prashant Sharma 1

Vinita Sharma 1

Rahul Sharnagat 1

Akash Sheoran 1

Amit P. Sheth 1

Kushagra Shree 1

Satyam Shukla 1

Gurneet Singh 1

Janardhan Singh 1

Simranjeet Singh 1

Avinash Kumar Singh 1

Virendra Singh 1

Prerana Singhal 1

Sushant Sinha 1

Sahoo Sovan Kumar 1

Ankit Srivastava 1

Medchalimi Sruthi 1

Aneerav Sukhoo 1

Jayaprakash Sundararaj 1

Partha Talukdar 1

Prabhjit Thind 1

Divyank Tiwari 1

Sarbajeet Tiwari 1

Mutsuko Tomokiyo 1

Drumil Trivedi 1

George Tsatsaronis 1

Raghavendra Udupa 1

Apoorva Upadhyaya 1

Akshit Varmora 1

Shikhar Vashishth 1

Vinay Reddy Venumuddala 1

Vasudeva Verma 1

Archana Yadav 1

Ravi Kumar Yadav 1

Seid Muhie Yimam 1

Waisullah Yousofi 1

Chrysoula Zerva 1

Xiangyu Zhang 1

Leonardo Zilio 1

Venues

WS14

JEP/TALN/RECITAL1