Prakash Mandayam Comar
2025
In-Context Reinforcement Learning with Retrieval-Augmented Generation for Text-to-SQL
Rishit Toteja
|
Arindam Sarkar
|
Prakash Mandayam Comar
Proceedings of the 31st International Conference on Computational Linguistics
Text-to-SQL simplifies database interactions by enabling non-experts to convert their natural language (NL) questions to Structured Query Language (SQL) queries. With advancements in Large Language Models (LLM), in-context learning (ICL) has emerged as a popular choice for building Text-to-SQL systems. Real world, industry-scale databases, often comprise thousands of tables and hundreds of columns, and makes passing the entire schema as context to an LLM infeasibly expensive. This requisites access to the correct database and the set of tables. Recently Retrieval Augmented Generation (RAG) based methods have been proposed for retrieving relevant subset of databases and tables for a given query. However, we observe that the existing methods of synthetic query generation can generate predominantly simple queries which might not be sufficiently representative of complex, real world queries, thus, negatively affecting the quality of the generated SQL. To address this, we propose an innovative in-context reinforcement learning (ICRL) based framework which refines the question generation process by enhancing the model’s ability to produce intricate queries that practitioners may pose during inference. In contrast to the existing approaches, our framework ensures the generation of synthetic SQL queries which are diverse and complex. We demonstrate the effectiveness of our approach via multiple experiments comparing against the representative state-of-the-art models on public benchmark datasets and observe substantial improvements in performance and scalability. Our method achieves 15-20% higher recall in database/table retrieval task compared to the existing state-of-the-art models for schema identification and upto 2% higher execution accuracy for SQL generation.
RxLens: Multi-Agent LLM-powered Scan and Order for Pharmacy
Akshay Jagatap
|
Srujana Merugu
|
Prakash Mandayam Comar
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)
Automated construction of shopping cart frommedical prescriptions is a vital prerequisite forscaling up online pharmaceutical servicesin emerging markets due to the high prevalence of paper prescriptionsthat are challenging for customers to interpret.We present RxLens, a multi-step end-end Large Language Model (LLM)-based deployed solutionfor automated pharmacy cart construction comprisingmultiple steps: redaction of Personal Identifiable Information (PII),Optical Character Recognition (OCR), medication extraction, matching against the catalog, and bounding box detection for lineage. Our multi-step design leverages the synergy between retrieval and LLM-based generationto mitigate the vocabulary gaps in LLMs and fuzzy matching errors during retrieval.Empirical evaluation demonstrates that RxLens can yield up to 19% - 40% and 11% - 26% increase in Recall@3 relative to SOTA methods such as Medical Comprehend and vanilla retrieval augmentation of LLMs on handwritten and printed prescriptions respectively.We also explore LLM-based auto-evaluation as an alternative to costly manual annotations and observe a 76% - 100% match relative to human judgements on various tasks.
2024
DiAL : Diversity Aware Listwise Ranking for Query Auto-Complete
Sonali Singh
|
Sachin Sudhakar Farfade
|
Prakash Mandayam Comar
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Query Auto-Complete (QAC) is an essential search feature that suggests users with a list of potential search keyword completions as they type, enabling them to complete their queries faster. While the QAC systems in eCommerce stores generally use the Learning to Rank (LTR) approach optimized based on customer feedback, it struggles to provide diverse suggestions, leading to repetitive queries and limited navigational suggestions related to product categories, attributes, and brands. This paper proposes a novel DiAL framework that explicitly optimizes for diversity alongside customer feedback signals. It achieves this by leveraging a smooth approximation of the diversity-based metric (𝛼NDCG) as a listwise loss function and modifying it to balance relevance and diversity. The proposed approach yielded an improvement of 8.5% in mean reciprocal rank (MRR) and 22.8% in 𝛼NDCG compared to the pairwise ranking approach on an eCommerce dataset, while meeting the ultra-low latency constraints of QAC systems. In an online experiment, the diversity-aware listwise QAC model resulted in a 0.48% lift in revenue. Furthermore, we replicated the proposed approach on a publicly available search log, demonstrating improvements in both diversity and relevance of the suggested queries.
Search
Fix data
Co-authors
- Sachin Sudhakar Farfade 1
- Akshay Jagatap 1
- Srujana Merugu 1
- Arindam Sarkar 1
- Sonali Singh 1
- show all...