Nidhir Bhavsar


pdf bib
DeepCon: An End-to-End Multilingual Toolkit for Automatic Minuting of Multi-Party Dialogues
Aakash Bhatnagar | Nidhir Bhavsar | Muskaan Singh
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

In this paper, we present our minuting tool DeepCon, an end-to-end toolkit for minuting the multiparty dialogues of meetings. It provides technological support for (multilingual) communication and collaboration, with a specific focus on Natural Language Processing (NLP) technologies: Automatic Speech Recognition (ASR), Machine Translation (MT), Automatic Minuting (AM), Topic Modelling (TM) and Named Entity Recognition (NER). To the best of our knowledge, there is no such tool available. Further, this tool follows a microservice architecture, and we release the tool as open-source, deployed on Amazon Web Services (AWS). We release our tool open-source here

pdf bib
Innovators @ SMM4H’22: An Ensembles Approach for self-reporting of COVID-19 Vaccination Status Tweets
Mohammad Zohair | Nidhir Bhavsar | Aakash Bhatnagar | Muskaan Singh
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

With the Surge in COVID-19, the number of social media postings related to the vaccine has grown, specifically tracing the confirmed reports by the users regarding the COVID-19 vaccine dose termed “Vaccine Surveillance.” To mitigate this research problem, we present our novel ensembled approach for self-reporting COVID-19 vaccination status tweets into two labels, namely “Vaccine Chatter” and “Self Report.” We utilize state-of-the-art models, namely BERT, RoBERTa, and XLNet. Our model provides promising results with 0.77, 0.93, and 0.66 as precision, recall, and F1-score (respectively), comparable to the corresponding median scores of 0.77, 0.9, and 0.68 (respec- tively). The model gave an overall accuracy of 93.43. We also present an empirical analysis of the results to present how well the tweet was able to classify and report. We release our code base here

pdf bib
Innovators@SMM4H’22: An Ensembles Approach for Stance and Premise Classification of COVID-19 Health Mandates Tweets
Vatsal Savaliya | Aakash Bhatnagar | Nidhir Bhavsar | Muskaan Singh
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

This paper presents our submission for the Shared Task-2 of classification of stance and premise in tweets about health mandates related to COVID-19 at the Social Media Mining for Health 2022. There have been a plethora of tweets about people expressing their opinions on the COVID-19 epidemic since it first emerged. The shared task emphasizes finding the level of cooperation within the mandates for their stance towards the health orders of the pandemic. Overall the shared subjects the participants to propose system’s that can efficiently perform 1) Stance Detection, which focuses on determining the author’s point of view in the text. 2) Premise Classification, which indicates whether or not the text has arguments. Through this paper we propose an orchestration of multiple transformer based encoders to derive the output for stance and premise classification. Our best model achieves a F1 score of 0.771 for Premise Classification and an aggregate macro-F1 score of 0.661 for Stance Detection. We have made our code public here

pdf bib
Team Innovators at SemEval-2022 for Task 8: Multi-Task Training with Hyperpartisan and Semantic Relation for Multi-Lingual News Article Similarity
Nidhir Bhavsar | Rishikesh Devanathan | Aakash Bhatnagar | Muskaan Singh | Petr Motlicek | Tirthankar Ghosal
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This work represents the system proposed by team Innovators for SemEval 2022 Task 8: Multilingual News Article Similarity. Similar multilingual news articles should match irrespective of the style of writing, the language of conveyance, and subjective decisions and biases induced by medium/outlet. The proposed architecture includes a machine translation system that translates multilingual news articles into English and presents a multitask learning model trained simultaneously on three distinct datasets. The system leverages the PageRank algorithm for Long-form text alignment. Multitask learning approach allows simultaneous training of multiple tasks while sharing the same encoder during training, facilitating knowledge transfer between tasks. Our best model is ranked 16 with a Pearson score of 0.733.

pdf bib
Hierarchical Multi-task learning framework for Isometric-Speech Language Translation
Aakash Bhatnagar | Nidhir Bhavsar | Muskaan Singh | Petr Motlicek
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This paper presents our submission for the shared task on isometric neural machine translation in International Conference on Spoken Language Translation (IWSLT). There are numerous state-of-art models for translation problems. However, these models lack any length constraint to produce short or long outputs from the source text. In this paper, we propose a hierarchical approach to generate isometric translation on MUST-C dataset, we achieve a BERTscore of 0.85, a length ratio of 1.087, a BLEU score of 42.3, and a length range of 51.03%. On the blind dataset provided by the task organizers, we obtain a BERTscore of 0.80, a length ratio of 1.10 and a length range of 47.5%. We have made our code public here

pdf bib
Automatic Summarization for Creative Writing: BART based Pipeline Method for Generating Summary of Movie Scripts
Aditya Upadhyay | Nidhir Bhavsar | Aakash Bhatnagar | Muskaan Singh | Petr Motlicek
Proceedings of The Workshop on Automatic Summarization for Creative Writing

This paper documents our approach for the Creative-Summ 2022 shared task for Automatic Summarization of Creative Writing. For this purpose, we develop an automatic summarization pipeline where we leverage a denoising autoencoder for pretraining sequence-to-sequence models and fine-tune it on a large-scale abstractive screenplay summarization dataset to summarize TV transcripts from primetime shows. Our pipeline divides the input transcript into smaller conversational blocks, removes redundant text, summarises the conversational blocks, obtains the block-wise summaries, cleans, structures, and then integrates the summaries to create the meeting minutes. Our proposed system achieves some of the best scores across multiple metrics(lexical, semantical) in the Creative-Summ shared task.