Arpan Phukan


2024

pdf bib
ECIS-VQG: Generation of Entity-centric Information-seeking Questions from Videos
Arpan Phukan | Manish Gupta | Asif Ekbal
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Previous studies on question generation from videos have mostly focused on generating questions about common objects and attributes and hence are not entity-centric. In this work, we focus on the generation of entity-centric information-seeking questions from videos. Such a system could be useful for video-based learning, recommending “People Also Ask” questions, video-based chatbots, and fact-checking. Our work addresses three key challenges: identifying question-worthy information, linking it to entities, and effectively utilizing multimodal signals. Further, to the best of our knowledge, there does not exist a large-scale dataset for this task. Most video question generation datasets are on TV shows, movies, or human activities or lack entity-centric information-seeking questions. Hence, we contribute a diverse dataset of YouTube videos, VideoQuestions, consisting of 411 videos with 2265 manually annotated questions. We further propose a model architecture combining Transformers, rich context signals (titles, transcripts, captions, embeddings), and a combination of cross-entropy and contrastive loss function to encourage entity-centric question generation. Our best method yields BLEU, ROUGE, CIDEr, and METEOR scores of 71.3, 78.6, 7.31, and 81.9, respectively, demonstrating practical usability. We make the code and dataset publicly available.

pdf bib
Hope ‘The Paragraph Guy’ explains the rest : Introducing MeSum, the Meme Summarizer
Anas Anwarul Haq Khan | Tanik Saikh | Arpan Phukan | Asif Ekbal
Findings of the Association for Computational Linguistics: EMNLP 2024

2023

pdf bib
QeMMA: Quantum-Enhanced Multi-Modal Sentiment Analysis
Arpan Phukan | Asif Ekbal
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

Multi-modal data analysis presents formidable challenges, as developing effective methods to capture correlations among different modalities remains an ongoing pursuit. In this study, we address multi-modal sentiment analysis through a novel quantum perspective. We propose that quantum principles, such as superposition, entanglement, and interference, offer a more comprehensive framework for capturing not only the cross-modal interactions between text, acoustics, and visuals but also the intricate relations within each modality. To empirically evaluate our approach, we employ the CMUMOSEI dataset as our testbed and utilize Qiskit by IBM to run our experiments on a quantum computer. Our proposed Quantum-Enhanced Multi-Modal Analysis Framework (QeMMA) showcases its significant potential by surpassing the baseline by 3.52% and 10.14% in terms of accuracy and F1 score, respectively, highlighting the promise of quantum-inspired methodologies.