2024
pdf
bib
abs
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs
Kinjal Basu
|
Ibrahim Abdelaziz
|
Subhajit Chaudhury
|
Soham Dan
|
Maxwell Crouse
|
Asim Munawar
|
Vernon Austel
|
Sadhana Kumaravel
|
Vinod Muthusamy
|
Pavan Kapanipathi
|
Luis Lastras
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
There is a growing need for Large Language Models (LLMs) to effectively use tools and external Application Programming Interfaces (APIs) to plan and complete tasks. As such, there is tremendous interest in methods that can acquire sufficient quantities of train and test data that involve calls to tools / APIs. Two lines of research have emerged as the predominant strategies for addressing this challenge. The first has focused on synthetic data generation techniques, while the second has involved curating task-adjacent datasets which can be transformed into API / Tool-based tasks. In this paper, we focus on the task of identifying, curating, and transforming existing datasets and, in turn, introduce API-BLEND, a large corpora for training and systematic testing of tool-augmented LLMs. The datasets mimic real-world scenarios involving API-tasks such as API / tool detection, slot filling, and sequencing of the detected APIs. We demonstrate the utility of the API-BLEND dataset for both training and benchmarking purposes.
pdf
bib
abs
EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning
Kinjal Basu
|
Keerthiram Murugesan
|
Subhajit Chaudhury
|
Murray Campbell
|
Kartik Talamadupula
|
Tim Klinger
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Text-based games (TBGs) have emerged as an important collection of NLP tasks, requiring reinforcement learning (RL) agents to combine natural language understanding with reasoning. A key challenge for agents attempting to solve such tasks is to generalize across multiple games and demonstrate good performance on both seen and unseen objects. Purely deep-RL-based approaches may perform well on seen objects; however, they fail to showcase the same performance on unseen objects. Commonsense-infused deep-RL agents may work better on unseen data; unfortunately, their policies are often not interpretable or easily transferable. To tackle these issues, in this paper, we present EXPLORER which is an exploration-guided reasoning agent for textual reinforcement learning. EXPLORER is neuro-symbolic in nature, as it relies on a neural module for exploration and a symbolic module for exploitation. It can also learn generalized symbolic policies and perform well over unseen data. Our experiments show that EXPLORER outperforms the baseline agents on Text-World cooking (TW-Cooking) and Text-World Commonsense (TWC) games.
pdf
bib
abs
Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Ibrahim Abdelaziz
|
Kinjal Basu
|
Mayank Agarwal
|
Sadhana Kumaravel
|
Matthew Stallone
|
Rameswar Panda
|
Yara Rizk
|
G P Shrivatsa Bhargav
|
Maxwell Crouse
|
Chulaka Gunasekara
|
Shajith Ikbal
|
Sachindra Joshi
|
Hima Karanam
|
Vineet Kumar
|
Asim Munawar
|
Sumit Neelam
|
Dinesh Raghu
|
Udit Sharma
|
Adriana Meza Soria
|
Dheeraj Sreedhar
|
Praveen Venkateswaran
|
Merve Unuvar
|
David Daniel Cox
|
Salim Roukos
|
Luis A. Lastras
|
Pavan Kapanipathi
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
An emergent research trend explores the use of Large Language Models (LLMs) as the backbone of agentic systems (e.g., SWE-Bench, Agent-Bench). To fulfill LLMs’ potential as autonomous agents, they must be able to identify, call, and interact with a variety of external tools and application program interfaces (APIs). This capability of LLMs, commonly termed function calling, leads to a myriad of advantages such as access to current and domain-specific information in databases and the outsourcing of tasks that can be reliably performed by tools. In this work, we introduce Granite-20B-FunctionCalling, a model trained using a multi-task training approach on seven fundamental tasks encompassed in function calling. Our comprehensive evaluation on multiple out-of-domain datasets, which compares Granite-20B-FunctionCalling to more than 15 other best proprietary and open models, shows that Granite-20B-FunctionCalling has better generalizability on multiple tasks across seven different evaluation benchmarks. Moreover, Granite-20B-FunctionCalling shows the best performance among all open models and ranks among the top on the Berkeley Function Calling Leaderboard (BFCL).