Vivek Dahiya


2024

pdf bib
Chitranuvad: Adapting Multi-lingual LLMs for Multimodal Translation
Shaharukh Khan | Ayush Tarun | Ali Faraz | Palash Kamble | Vivek Dahiya | Praveen Pokala | Ashish Kulkarni | Chandra Khatri | Abhinav Ravi | Shubham Agarwal
Proceedings of the Ninth Conference on Machine Translation

In this work, we provide the system description of our submission as part of the English-to-Lowres Multimodal Translation Task at theWorkshop on Asian Translation (WAT2024). We introduce Chitranuvad, a multimodal model that effectively integrates Multilingual LLMand a vision module for Multimodal Translation. Our method uses a ViT image encoder to extract visual representations as visual tokenembeddings which are projected to the LLM space by an adapter layer and generates translation in an autoregressive fashion. We participated in all the three tracks (Image Captioning, Text-only and Multimodal translationtasks) for Indic languages (ie. English translation to Hindi, Bengali and Malyalam) and achieved SOTA results for Hindi in all of themon the Challenge set while remaining competitive for the other languages in the shared task.

2022

pdf bib
IAEmp: Intent-aware Empathetic Response Generation
Mrigank Tiwari | Vivek Dahiya | Om Mohanty | Girija Saride
Proceedings of the 19th International Conference on Natural Language Processing (ICON)

In the domain of virtual assistants or conversational systems, it is important to empathise with the user. Being empathetic involves understanding the emotion of the ongoing dialogue and responding to the situation with empathy. We propose a novel approach for empathetic response generation, which leverages predicted intents for future response and prompts the encoder-decoder model to improve empathy in generated responses. Our model exploits the combination of dialogues and their respective emotions to generate empathetic response. As responding intent plays an important part in our generation, we also employ one or more intents to generate responses with relevant empathy. We achieve improved human and automated metrics, compared to the baselines.