Vinod Muthusamy
2024
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs
Kinjal Basu
|
Ibrahim Abdelaziz
|
Subhajit Chaudhury
|
Soham Dan
|
Maxwell Crouse
|
Asim Munawar
|
Vernon Austel
|
Sadhana Kumaravel
|
Vinod Muthusamy
|
Pavan Kapanipathi
|
Luis Lastras
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
There is a growing need for Large Language Models (LLMs) to effectively use tools and external Application Programming Interfaces (APIs) to plan and complete tasks. As such, there is tremendous interest in methods that can acquire sufficient quantities of train and test data that involve calls to tools / APIs. Two lines of research have emerged as the predominant strategies for addressing this challenge. The first has focused on synthetic data generation techniques, while the second has involved curating task-adjacent datasets which can be transformed into API / Tool-based tasks. In this paper, we focus on the task of identifying, curating, and transforming existing datasets and, in turn, introduce API-BLEND, a large corpora for training and systematic testing of tool-augmented LLMs. The datasets mimic real-world scenarios involving API-tasks such as API / tool detection, slot filling, and sequencing of the detected APIs. We demonstrate the utility of the API-BLEND dataset for both training and benchmarking purposes.
2023
Towards large language model-based personal agents in the enterprise: Current trends and open problems
Vinod Muthusamy
|
Yara Rizk
|
Kiran Kate
|
Praveen Venkateswaran
|
Vatche Isahagian
|
Ashu Gulati
|
Parijat Dube
Findings of the Association for Computational Linguistics: EMNLP 2023
There is an emerging trend to use large language models (LLMs) to reason about complex goals and orchestrate a set of pluggable tools or APIs to accomplish a goal. This functionality could, among other use cases, be used to build personal assistants for knowledge workers. While there are impressive demos of LLMs being used as autonomous agents or for tool composition, these solutions are not ready mission-critical enterprise settings. For example, they are brittle to input changes, and can produce inconsistent results for the same inputs. These use cases have many open problems in an exciting area of NLP research, such as trust and explainability, consistency and reproducibility, adherence to guardrails and policies, best practices for composable tool design, and the need for new metrics and benchmarks. This vision paper illustrates some examples of LLM-based autonomous agents that reason and compose tools, highlights cases where they fail, surveys some of the recent efforts in this space, and lays out the research challenges to make these solutions viable for enterprises.