Archita Pathak


2024

pdf bib
E-Commerce Product Categorization with LLM-based Dual-Expert Classification Paradigm
Zhu Cheng | Wen Zhang | Chih-Chi Chou | You-Yi Jau | Archita Pathak | Peng Gao | Umit Batur
Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)

Accurate product categorization in e-commerce is critical for delivering a satisfactory online shopping experience to customers. With the vast number of available products and the numerous potential categories, it becomes crucial to develop a classification system capable of assigning products to their correct categories with high accuracy. We present a dual-expert classification system that utilizes the power of large language models (LLMs). This framework integrates domain-specific knowledge and pre-trained LLM’s general knowledge through effective model fine-tuning and prompting techniques. First, the fine-tuned domain-specific expert recommends top K candidate categories for a given input product. Then, the more general LLM-based expert, through prompting techniques, analyzes the nuanced differences between candidate categories and selects the most suitable target category. We introduce a new in-context learning approach that utilizes LLM self-generated summarization to provide clearer instructions and enhance its performance. Experiments on e-commerce datasets demonstrate the effectiveness of our LLM-based Dual-Expert classification system.

2020

pdf bib
Self-Supervised Claim Identification for Automated Fact Checking
Archita Pathak | Mohammad Abuzar Shaikh | Rohini Srihari
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

We propose a novel, attention-based self-supervised approach to identify “claim-worthy” sentences in a fake news article, an important first step in automated fact-checking. We leverage aboutness of headline and content using attention mechanism for this task. The identified claims can be used for downstream task of claim verification for which we are releasing a benchmark dataset of manually selected compelling articles with veracity labels and associated evidence. This work goes beyond stylistic analysis to identifying content that influences reader belief. Experiments with three datasets show the strength of our model.

2019

pdf bib
BREAKING! Presenting Fake News Corpus for Automated Fact Checking
Archita Pathak | Rohini Srihari
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Popular fake news articles spread faster than mainstream articles on the same topic which renders manual fact checking inefficient. At the same time, creating tools for automatic detection is as challenging due to lack of dataset containing articles which present fake or manipulated stories as compelling facts. In this paper, we introduce manually verified corpus of compelling fake and questionable news articles on the USA politics, containing around 700 articles from Aug-Nov, 2016. We present various analyses on this corpus and finally implement classification model based on linguistic features. This work is still in progress as we plan to extend the dataset in the future and use it for our approach towards automated fake news detection.