2024
pdf
bib
abs
Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Ibrahim Abdelaziz
|
Kinjal Basu
|
Mayank Agarwal
|
Sadhana Kumaravel
|
Matthew Stallone
|
Rameswar Panda
|
Yara Rizk
|
G P Shrivatsa Bhargav
|
Maxwell Crouse
|
Chulaka Gunasekara
|
Shajith Ikbal
|
Sachindra Joshi
|
Hima Karanam
|
Vineet Kumar
|
Asim Munawar
|
Sumit Neelam
|
Dinesh Raghu
|
Udit Sharma
|
Adriana Meza Soria
|
Dheeraj Sreedhar
|
Praveen Venkateswaran
|
Merve Unuvar
|
David Daniel Cox
|
Salim Roukos
|
Luis A. Lastras
|
Pavan Kapanipathi
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
An emergent research trend explores the use of Large Language Models (LLMs) as the backbone of agentic systems (e.g., SWE-Bench, Agent-Bench). To fulfill LLMs’ potential as autonomous agents, they must be able to identify, call, and interact with a variety of external tools and application program interfaces (APIs). This capability of LLMs, commonly termed function calling, leads to a myriad of advantages such as access to current and domain-specific information in databases and the outsourcing of tasks that can be reliably performed by tools. In this work, we introduce Granite-20B-FunctionCalling, a model trained using a multi-task training approach on seven fundamental tasks encompassed in function calling. Our comprehensive evaluation on multiple out-of-domain datasets, which compares Granite-20B-FunctionCalling to more than 15 other best proprietary and open models, shows that Granite-20B-FunctionCalling has better generalizability on multiple tasks across seven different evaluation benchmarks. Moreover, Granite-20B-FunctionCalling shows the best performance among all open models and ranks among the top on the Berkeley Function Calling Leaderboard (BFCL).
pdf
bib
abs
LangNav: Language as a Perceptual Representation for Navigation
Bowen Pan
|
Rameswar Panda
|
SouYoung Jin
|
Rogerio Feris
|
Aude Oliva
|
Phillip Isola
|
Yoon Kim
Findings of the Association for Computational Linguistics: NAACL 2024
We explore the use of language as a perceptual representation for vision-and-language navigation (VLN), with a focus on low-data settings. Our approach uses off-the-shelf vision systems for image captioning and object detection to convert an agent’s egocentric panoramic view at each time step into natural language descriptions. We then finetune a pretrained language model to select an action, based on the current view and the trajectory history, that would best fulfill the navigation instructions. In contrast to the standard setup which adapts a pretrained language model to work directly with continuous visual features from pretrained vision models, our approach instead uses (discrete) language as the perceptual representation. We explore several use cases of our language-based navigation (LangNav) approach on the R2R VLN benchmark: generating synthetic trajectories from a prompted language model (GPT-4) with which to finetune a smaller language model; domain transfer where we transfer a policy learned on one simulated environment (ALFRED) to another (more realistic) environment (R2R); and combining both vision- and language-based representations for VLN. Our approach is found to improve upon baselines that rely on visual features in settings where only a few expert trajectories (10-100) are available, demonstrating the potential of language as a perceptual representation for navigation.
2023
pdf
bib
abs
Synthetic Pre-Training Tasks for Neural Machine Translation
Zexue He
|
Graeme Blackwood
|
Rameswar Panda
|
Julian McAuley
|
Rogerio Feris
Findings of the Association for Computational Linguistics: ACL 2023
Pre-training models with large crawled corpora can lead to issues such as toxicity and bias, as well as copyright and privacy concerns. A promising way of alleviating such concerns is to conduct pre-training with synthetic tasks and data, since no real-world information is ingested by the model. Our goal in this paper is to understand the factors that contribute to the effectiveness of pre-training models when using synthetic resources, particularly in the context of neural machine translation. We propose several novel approaches to pre-training translation models that involve different levels of lexical and structural knowledge, including: 1) generating obfuscated data from a large parallel corpus 2) concatenating phrase pairs extracted from a small word-aligned corpus, and 3) generating synthetic parallel data without real human language corpora. Our experiments on multiple language pairs reveal that pre-training benefits can be realized even with high levels of obfuscation or purely synthetic parallel data. We hope the findings from our comprehensive empirical analysis will shed light on understanding what matters for NMT pre-training, as well as pave the way for the development of more efficient and less toxic models.