Sameep Mehta


2024

pdf bib
Sequential API Function Calling Using GraphQL Schema
Avirup Saha | Lakshmi Mandal | Balaji Ganesan | Sambit Ghosh | Renuka Sindhgatta | Carlos Eberhardt | Dan Debrunner | Sameep Mehta
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Function calling using Large Language Models (LLMs) is an active research area that aims to empower LLMs with the ability to execute APIs to perform real-world tasks. However, sequential function calling using LLMs with interdependence between functions is still under-explored. To this end, we introduce GraphQLRestBench, a dataset consisting of natural language utterances paired with function call sequences representing real-world REST API calls with variable mapping between functions. In order to represent the response structure of the functions in the LLM prompt, we use the GraphQL schema of the REST APIs. We also introduce a custom evaluation framework for our dataset consisting of four specially designed metrics. We evaluate various open-source LLMs on our dataset using few-shot Chain-of-Thought and ReAct prompting to establish a reasonable baseline.

pdf bib
GraphQL Query Generation: A Large Training and Benchmarking Dataset
Manish Kesarwani | Sambit Ghosh | Nitin Gupta | Shramona Chakraborty | Renuka Sindhgatta | Sameep Mehta | Carlos Eberhardt | Dan Debrunner
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

GraphQL is a powerful query language for APIs that allows clients to fetch precise data efficiently and flexibly, querying multiple resources with a single request. However, crafting complex GraphQL query operations can be challenging. Large Language Models (LLMs) offer an alternative by generating GraphQL queries from natural language, but they struggle due to limited exposure to publicly available GraphQL schemas, often resulting in invalid or suboptimal queries. Furthermore, no benchmark test data suite is available to reliably evaluate the performance of contemporary LLMs.To address this, we present a large-scale, cross-domain Text-to-GraphQL query operation dataset. The dataset includes 10,940 training triples spanning 185 cross-source data stores and 957 test triples over 14 data stores. Each triple consists of a GraphQL schema, GraphQL query operation, and corresponding natural language query. The dataset has been predominantly manually created, with natural language paraphrasing, and carefully validated, requiring approximately 1200 person-hours. In our evaluation, we tested 10 state-of-the-art LLMs using our test dataset. The best-performing model achieved an accuracy of only around 50% with one in-context few-shot example, underscoring the necessity for custom fine-tuning. To support further research and benchmarking, we are releasing the training and test datasets under the MIT License. The dataset is available at https://github.com/stepzen-dev/NL2GQL.

2023

pdf bib
CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation
Rahul Madhavan | Rishabh Garg | Kahini Wadhawan | Sameep Mehta
Findings of the Association for Computational Linguistics: ACL 2023

We propose a method to control the attributes of Language Models (LMs) for the text generation task using Causal Average Treatment Effect (ATE) scores and counterfactual augmentation. We explore this method, in the context of LM detoxification, and propose the Causally Fair Language (CFL) architecture for detoxifying pre-trained LMs in a plug-and-play manner. Our architecture is based on a Structural Causal Model (SCM) that is mathematically transparent and computationally efficient as compared with many existing detoxification techniques. We also propose several new metrics that aim to better understand the behaviour of LMs in the context of toxic text generation. Further, we achieve state of the art performance for toxic degeneration, which are computed using Real Toxicity Prompts. Our experiments show that CFL achieves such a detoxification without much impact on the model perplexity. We also show that CFL mitigates the unintended bias problem through experiments on the BOLD dataset.

2013

pdf bib
An Empirical Assessment of Contemporary Online Media in Ad-Hoc Corpus Creation for Social Events
Kanika Narang | Seema Nagar | Sameep Mehta | L V Subramaniam | Kuntal Dey
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
NLP for uncertain data at scale
Sameep Mehta | L. V. Subramaniam
NAACL HLT 2013 Tutorial Abstracts