Junnan Li


pdf bib
CodeT5+: Open Code Large Language Models for Code Understanding and Generation
Yue Wang | Hung Le | Akhilesh Gotmare | Nghi Bui | Junnan Li | Steven Hoi
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. However, existing code LLMs have two main limitations. First, they often adopt a specific architecture (encoder-only or decoder-only) or rely on a unified encoder-decoder network for different downstream tasks, lacking the flexibility to operate in the optimal architecture for a specific task. Secondly, they often employ a limited set of pretraining objectives which might not be relevant to some tasks and hence result in substantial performance degrade. To address these limitations, we propose “CodeT5+”, a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of code tasks. Such flexibility is enabled by our proposed mixture of pretraining objectives, which cover span denoising, contrastive learning, text-code matching, and causal LM pretraining tasks, on both unimodal and bimodal multilingual code corpora. Furthermore, we propose to initialize CodeT5+ with frozen off-the-shelf LLMs without training from scratch to efficiently scale up our models, and explore instruction-tuning to align with natural language instructions. We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning. We observe state-of-the-art (SoTA) performance on various code-related tasks, and our instruction-tuned CodeT5+ 16B achieves new SoTA results of 35.0% pass@1 and 54.5% pass@10 on the HumanEval code generation task against other open code LLMs, even surpassing the OpenAI code-cushman-001 model.

pdf bib
LAVIS: A One-stop Library for Language-Vision Intelligence
Dongxu Li | Junnan Li | Hung Le | Guangsen Wang | Silvio Savarese | Steven C.H. Hoi
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications. LAVIS aims to serve as a one-stop comprehensive library that brings recent advancements in the language-vision field accessible for researchers and practitioners, as well as fertilizing future research and development. It features a unified interface to easily access state-of-the-art image-language, video-language models and common datasets. LAVIS supports training, evaluation and benchmarking on a rich variety of tasks, including multimodal classification, retrieval, captioning, visual question answering, dialogue and pre-training. In the meantime, the library is also highly extensible and configurable, facilitating future development and customization. In this technical report, we describe design principles, key components and functionalities of the library, and also present benchmarking results across common language-vision tasks.


pdf bib
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
Anthony Meng Huat Tiong | Junnan Li | Boyang Li | Silvio Savarese | Steven C.H. Hoi
Findings of the Association for Computational Linguistics: EMNLP 2022

Visual question answering (VQA) is a hallmark of vision and language reasoningand a challenging task under the zero-shot setting. We propose Plug-and-Play VQA (PNP-VQA),a modular framework for zero-shot VQA.In contrast to most existing works, which require substantial adaptation of pretrained language models (PLMs) for the vision modality,PNP-VQA requires no additional training of the PLMs.Instead, we propose to use natural language and network interpretation as an intermediate representation that glues pretrained models together. We first generate question-guided informative image captions,and pass the captions to a PLM as context for question answering. Surpassing end-to-end trained baselines, PNP-VQA achieves state-of-the-art results on zero-shot VQAv2 and GQA. With 11B parameters, it outperforms the 80B-parameter Flamingo model by 8.5% on VQAv2. With 738M PLM parameters, PNP-VQA achieves an improvement of 9.1% on GQA over FewVLM with 740M PLM parameters.