2023
pdf
bib
abs
IMAGINATOR: Pre-Trained Image+Text Joint Embeddings using Word-Level Grounding of Images
Varuna Krishna Kolla
|
Suryavardan Suresh
|
Shreyash Mishra
|
Sathyanarayanan Ramamoorthy
|
Parth Patwa
|
Megha Chakraborty
|
Aman Chadha
|
Amitava Das
|
Amit Sheth
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Word embeddings, i.e., semantically meaningful vector representation of words, are largely influenced by the distributional hypothesis You shall know a word by the company it keeps (Harris, 1954), whereas modern prediction- based neural network embeddings rely on de- sign choices and hyperparameter optimization. Word embeddings like Word2Vec, GloVe etc. well capture the contextuality and real-world analogies but contemporary convolution-based image embeddings such as VGGNet, AlexNet, etc. do not capture contextual knowledge. The popular king-queen analogy does not hold true for most commonly used vision embeddings. In this paper, we introduce a pre-trained joint embedding (JE), named IMAGINATOR, trained on 21K distinct image objects. JE is a way to encode multimodal data into a vec- tor space where the text modality serves as the grounding key, which the complementary modality (in this case, the image) is anchored with. IMAGINATOR encapsulates three in- dividual representations: (i) object-object co- location, (ii) word-object co-location, and (iii) word-object correlation. These three ways cap- ture complementary aspects of the two modal- ities which are further combined to obtain the final object-word JEs. Generated JEs are intrinsically evaluated to assess how well they capture the contextual- ity and real-world analogies. We also evalu- ate pre-trained IMAGINATOR JEs on three downstream tasks: (i) image captioning, (ii) Im- age2Tweet, and (iii) text-based image retrieval. IMAGINATOR establishes a new standard on the aforementioned downstream tasks by out- performing the current SoTA on all the selected tasks. The code is available at https:// github.com/varunakk/IMAGINATOR.
pdf
bib
abs
FACTIFY-5WQA: 5W Aspect-based Fact Verification through Question Answering
Anku Rani
|
S.M Towhidul Islam Tonmoy
|
Dwip Dalal
|
Shreya Gautam
|
Megha Chakraborty
|
Aman Chadha
|
Amit Sheth
|
Amitava Das
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Automatic fact verification has received significant attention recently. Contemporary automatic fact-checking systems focus on estimating truthfulness using numerical scores which are not human-interpretable. A human fact-checker generally follows several logical steps to verify a verisimilitude claim and conclude whether it’s truthful or a mere masquerade. Popular fact-checking websites follow a common structure for fact categorization such as half true, half false, false, pants on fire, etc. Therefore, it is necessary to have an aspect-based (delineating which part(s) are true and which are false) explainable system that can assist human fact-checkers in asking relevant questions related to a fact, which can then be validated separately to reach a final verdict. In this paper, we propose a 5W framework (who, what, when, where, and why) for question-answer-based fact explainability. To that end, we present a semi-automatically generated dataset called FACTIFY-5WQA, which consists of 391, 041 facts along with relevant 5W QAs – underscoring our major contribution to this paper. A semantic role labeling system has been utilized to locate 5Ws, which generates QA pairs for claims using a masked language model. Finally, we report a baseline QA system to automatically locate those answers from evidence documents, which can serve as a baseline for future research in the field. Lastly, we propose a robust fact verification system that takes paraphrased claims and automatically validates them. The dataset and the baseline model are available at https: //github.com/ankuranii/acl-5W-QA
pdf
bib
abs
Counter Turing Test (CT2): AI-Generated Text Detection is Not as Easy as You May Think - Introducing AI Detectability Index (ADI)
Megha Chakraborty
|
S.M Towhidul Islam Tonmoy
|
S M Mehedi Zaman
|
Shreya Gautam
|
Tanay Kumar
|
Krish Sharma
|
Niyar Barman
|
Chandan Gupta
|
Vinija Jain
|
Aman Chadha
|
Amit Sheth
|
Amitava Das
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
With the rise of prolific ChatGPT, the risk and consequences of AI-generated text has increased alarmingly. This triggered a series of events, including an open letter, signed by thousands of researchers and tech leaders in March 2023, demanding a six-month moratorium on the training of AI systems more sophisticated than GPT-4. To address the inevitable question of ownership attribution for AI-generated artifacts, the US Copyright Office released a statement stating that “if the content is traditional elements of authorship produced by a machine, the work lacks human authorship and the office will not register it for copyright”. Furthermore, both the US and the EU governments have recently drafted their initial proposals regarding the regulatory framework for AI. Given this cynosural spotlight on generative AI, AI-generated text detection (AGTD) has emerged as a topic that has already received immediate attention in research, with some initial methods having been proposed, soon followed by the emergence of techniques to bypass detection. This paper introduces the Counter Turing Test (CT2), a benchmark consisting of techniques aiming to offer a comprehensive evaluation of the robustness of existing AGTD techniques. Our empirical findings unequivocally highlight the fragility of the proposed AGTD methods under scrutiny. Amidst the extensive deliberations on policy-making for regulating AI development, it is of utmost importance to assess the detectability of content generated by LLMs. Thus, to establish a quantifiable spectrum facilitating the evaluation and ranking of LLMs according to their detectability levels, we propose the AI Detectability Index (ADI). We conduct a thorough examination of 15 contemporary LLMs, empirically demonstrating that larger LLMs tend to have a lower ADI, indicating they are less detectable compared to smaller LLMs. We firmly believe that ADI holds significant value as a tool for the wider NLP community, with the potential to serve as a rubric in AI-related policy-making.
pdf
bib
abs
FACTIFY3M: A benchmark for multimodal fact verification with explainability through 5W Question-Answering
Megha Chakraborty
|
Khushbu Pahwa
|
Anku Rani
|
Shreyas Chatterjee
|
Dwip Dalal
|
Harshit Dave
|
Ritvik G
|
Preethi Gurumurthy
|
Adarsh Mahor
|
Samahriti Mukherjee
|
Aditya Pakala
|
Ishan Paul
|
Janvita Reddy
|
Arghya Sarkar
|
Kinjal Sensharma
|
Aman Chadha
|
Amit Sheth
|
Amitava Das
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Combating disinformation is one of the burning societal crises - about 67% of the American population believes that disinformation produces a lot of uncertainty, and 10% of them knowingly propagate disinformation. Evidence shows that disinformation can manipulate democratic processes and public opinion, causing disruption in the share market, panic and anxiety in society, and even death during crises. Therefore, disinformation should be identified promptly and, if possible, mitigated. With approximately 3.2 billion images and 720,000 hours of video shared online daily on social media platforms, scalable detection of multimodal disinformation requires efficient fact verification. Despite progress in automatic text-based fact verification (e.g., FEVER, LIAR), the research community lacks substantial effort in multimodal fact verification. To address this gap, we introduce FACTIFY 3M, a dataset of 3 million samples that pushes the boundaries of the domain of fact verification via a multimodal fake news dataset, in addition to offering explainability through the concept of 5W question-answering. Salient features of the dataset include: (i) textual claims, (ii) ChatGPT-generated paraphrased claims, (iii) associated images, (iv) stable diffusion-generated additional images (i.e., visual paraphrases), (v) pixel-level image heatmap to foster image-text explainability of the claim, (vi) 5W QA pairs, and (vii) adversarial fake news stories.