Aditya Sharma

2025

pdf bib abs
GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models
Aditya Sharma | Aman Dalmia | Mehran Kazemi | Amal Zouaq | Christopher Pal
Findings of the Association for Computational Linguistics: NAACL 2025

Geometry problem-solving demands advanced reasoning abilities to process multimodal inputs and employ mathematical knowledge effectively. Vision-language models (VLMs) have made significant progress in various multimodal tasks. Yet, they still struggle with geometry problems and are significantly limited by their inability to perform mathematical operations not seen during pre-training, such as calculating the cosine of an arbitrary angle, and by difficulties in correctly applying relevant geometry formulas. To overcome these challenges, we present GeoCoder, which leverages modular code-finetuning to generate and execute code using a predefined geometry function library. By executing the code, we achieve accurate and deterministic calculations, contrasting the stochastic nature of autoregressive token prediction, while the function library minimizes errors in formula usage. We also propose a multimodal retrieval-augmented variant of GeoCoder, named RAG-GeoCoder, which incorporates a non-parametric memory module for retrieving functions from the geometry library, thereby reducing reliance on parametric memory. Our modular code-finetuning approach enhances the geometric reasoning capabilities of VLMs, yielding an average improvement of over 16% across various question complexities on the GeomVerse dataset compared to other fine-tuning methods.

2024

pdf bib abs
Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts
Aditya Sharma | Michael Saxon | William Yang Wang
Findings of the Association for Computational Linguistics: EMNLP 2024

We present LoCoVQA, a dynamic benchmark generator for evaluating long-context reasoning in vision language models (VLMs). LoCoVQA augments test examples for mathematical reasoning, VQA, and character recognition tasks with increasingly long visual contexts composed of both in-distribution and out-of-distribution distractor images.Across these tasks, a diverse set of VLMs rapidly lose performance as the visual context length grows, often exhibiting a striking logarithmic decay trend. This test assesses how well VLMs can ignore irrelevant information when answering queries—a task that is quite easy for language models (LMs) in the text domain—demonstrating that current state-of-the-art VLMs lack this essential capability for many long-context applications.

2023

pdf bib abs
TwiRGCN: Temporally Weighted Graph Convolution for Question Answering over Temporal Knowledge Graphs
Aditya Sharma | Apoorv Saxena | Chitrank Gupta | Mehran Kazemi | Partha Talukdar | Soumen Chakrabarti
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Recent years have witnessed interest in Temporal Question Answering over Knowledge Graphs (TKGQA), resulting in the development of multiple methods. However, these are highly engineered, thereby limiting their generalizability, and they do not automatically discover relevant parts of the KG during multi-hop reasoning. Relational graph convolutional networks (RGCN) provide an opportunity to address both of these challenges – we explore this direction in the paper. Specifically, we propose a novel, intuitive and interpretable scheme to modulate the messages passed through a KG edge during convolution based on the relevance of its associated period to the question. We also introduce a gating device to predict if the answer to a complex temporal question is likely to be a KG entity or time and use this prediction to guide our scoring mechanism. We evaluate the resulting system, which we call TwiRGCN, on a recent challenging dataset for multi-hop complex temporal QA called TimeQuestions. We show that TwiRGCN significantly outperforms state-of-the-art models on this dataset across diverse question types. Interestingly, TwiRGCN improves accuracy by 9–10 percentage points for the most difficult ordinal and implicit question types.

2020

pdf bib abs
Improving Neural Machine Translation for Sanskrit-English
Ravneet Punia | Aditya Sharma | Sarthak Pruthi | Minni Jain
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

Sanskrit is one of the oldest languages of the Asian Subcontinent that fell out of common usage around 600 B.C. In this paper, we attempt to translate Sanskrit to English using Neural Machine Translation approaches based on Reinforcement Learning and Transfer learning that were never tried and tested on Sanskrit. Along with the paper, we also release monolingual Sanskrit and parallel aligned Sanskrit-English corpora for the research community. Our methodologies outperform the previous approaches applied to Sanskrit by various re- searchers and will further help the linguistic community to accelerate the costly and time consuming manual translation process.

2018

pdf bib abs
Towards Understanding the Geometry of Knowledge Graph Embeddings
Chandrahas | Aditya Sharma | Partha Talukdar
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Knowledge Graph (KG) embedding has emerged as a very active area of research over the last few years, resulting in the development of several embedding methods. These KG embedding methods represent KG entities and relations as vectors in a high-dimensional space. Despite this popularity and effectiveness of KG embeddings in various tasks (e.g., link prediction), geometric understanding of such embeddings (i.e., arrangement of entity and relation vectors in vector space) is unexplored – we fill this gap in the paper. We initiate a study to analyze the geometry of KG embeddings and correlate it with task performance and other hyperparameters. To the best of our knowledge, this is the first study of its kind. Through extensive experiments on real-world datasets, we discover several insights. For example, we find that there are sharp differences between the geometry of embeddings learnt by different classes of KG embeddings methods. We hope that this initial study will inspire other follow-up research on this important but unexplored problem.

2017

pdf bib abs
Speeding up Reinforcement Learning-based Information Extraction Training using Asynchronous Methods
Aditya Sharma | Zarana Parekh | Partha Talukdar
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

RLIE-DQN is a recently proposed Reinforcement Learning-based Information Extraction (IE) technique which is able to incorporate external evidence during the extraction process. RLIE-DQN trains a single agent sequentially, training on one instance at a time. This results in significant training slowdown which is undesirable. We leverage recent advances in parallel RL training using asynchronous methods and propose RLIE-A3C. RLIE-A3C trains multiple agents in parallel and is able to achieve upto 6x training speedup over RLIE-DQN, while suffering no loss in average accuracy.

Co-authors

Venues

Fix author