Jyotika Singh
2026
Barriers to Discrete Reasoning with Transformers: A Survey Across Depth, Exactness, and Bandwidth
Michelle Yuan | Weiyi Sun | Amir H. Rezaeian | Jyotika Singh | Sandip Ghoshal | Yao-Ting Wang | Miguel Ballesteros | Yassine Benajiba
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Michelle Yuan | Weiyi Sun | Amir H. Rezaeian | Jyotika Singh | Sandip Ghoshal | Yao-Ting Wang | Miguel Ballesteros | Yassine Benajiba
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Transformers have become the foundational architecture for a broad spectrum of sequence modeling applications, underpinning state-of-the-art systems in natural language processing, vision, and beyond. However, their theoretical limitations in discrete reasoning tasks, such as arithmetic, logical inference, and algorithmic composition, remain a critical open problem. In this survey, we synthesize recent advances from three theoretical perspectives: circuit complexity, approximation theory, and communication complexity, to clarify the structural and computational barriers that transformers face when performing symbolic computations. By connecting these established theoretical frameworks, we provide an accessible and unified account of why current transformer architectures struggle to implement exact discrete algorithms, even as they excel at pattern matching and interpolation. We review key definitions, seminal results, and illustrative examples, highlighting challenges such as depth constraints, difficulty approximating discontinuities, and bottlenecks in inter-token communication. Finally, we discuss implications for model design and suggest promising directions for overcoming these foundational limitations.
2025
RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
Amit Agarwal | Hitesh Laxmichand Patel | Srikant Panda | Hansa Meghwani | Jyotika Singh | Karan Dua | Paul Li | Tao Sheng | Sujith Ravi | Dan Roth
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Amit Agarwal | Hitesh Laxmichand Patel | Srikant Panda | Hansa Meghwani | Jyotika Singh | Karan Dua | Paul Li | Tao Sheng | Sujith Ravi | Dan Roth
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Multimodal Large Language Models (MLLMs) have achieved impressive results on vision-language benchmarks, yet it remains unclear whether these benchmarks assess genuine global reasoning or allow success via localized visual cues. Existing evaluation methods do not explicitly measure this distinction, hindering effective dataset curation and real-world focused model development.We introduce Region Comprehension Index (RCI), the first model-based score to directly quantify a dataset’s reliance on global versus local visual information. RCI systematically compares reference-model performance on image patches versus full images, revealing if tasks require holistic image understanding or can be solved with partial or localized visual cues.When applying RCI to 13 widely used multimodal benchmarks, we observed that most of them favor localized reasoning and exhibit significant spatial biases, indicating potential risks in real-world applications. RCI equips researchers & practitioners with an actionable tool for diagnosing & mitigating these biases, enabling the construction of datasets and benchmarks to foster the development of robust, enterprise-ready multimodal systems.
Can LLMs Narrate Tabular Data? An Evaluation Framework for Natural Language Representations of Text-to-SQL System Outputs
Jyotika Singh | Weiyi Sun | Amit Agarwal | Viji Krishnamurthy | Yassine Benajiba | Sujith Ravi | Dan Roth
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Jyotika Singh | Weiyi Sun | Amit Agarwal | Viji Krishnamurthy | Yassine Benajiba | Sujith Ravi | Dan Roth
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
In modern industry systems like multi-turn chat agents, Text-to-SQL technology bridges natural language (NL) questions and database (DB) querying. The conversion of tabular DB results into NL representations (NLRs) enables the chat-based interaction. Currently, NLR generation is typically handled by large language models (LLMs), but information loss or errors in presenting tabular results in NL remains largely unexplored.This paper introduces a novel evaluation method - Combo-Eval - for judgment of LLM-generated NLRs that combines the benefits of multiple existing methods, optimizing evaluation fidelity and achieving a significant reduction in LLM calls by 25-61%. Accompanying our method is NLR-BIRD, the first dedicated dataset for NLR benchmarking. Through human evaluations, we demonstrate the superior alignment of Combo-Eval with human judgments, applicable across scenarios with and without ground truth references.