Tian Lan


2024

pdf bib
PRACT: Optimizing Principled Reasoning and Acting of LLM Agent
Zhiwei Liu | Weiran Yao | Jianguo Zhang | Zuxin Liu | Liangwei Yang | Rithesh R N | Tian Lan | Ming Zhu | Juntao Tan | Shirley Kokane | Thai Quoc Hoang | Juan Carlos Niebles | Shelby Heinecke | Huan Wang | Silvio Savarese | Caiming Xiong
Proceedings of the 28th Conference on Computational Natural Language Learning

We introduce the Principled Reasoning and Acting (PRAct) framework, a novel method for learning and enforcing action principles from trajectory data. Central to our approach is the use of text gradients from a reflection and optimization engine to derive these action principles. To adapt action principles to specific task requirements, we propose a new optimization framework, Reflective Principle Optimization (RPO). After execution, RPO employs a reflector to critique current action principles and an optimizer to update them accordingly.We investigate the RPO framework under two scenarios: Reward-RPO, which uses environmental rewards for reflection, and Self-RPO, which conducts self-reflection without external rewards. Additionally, we developed two RPO methods, RPO-Traj and RPO-Batch, to adapt to different settings.Experimental results across four environments demonstrate that the PRAct agent, leveraging the RPO framework, can effectively learn and apply action principles to enhance performance.

2023

pdf bib
PandaGPT: One Model To Instruction-Follow Them All
Yixuan Su | Tian Lan | Huayang Li | Jialu Xu | Yan Wang | Deng Cai
Proceedings of the 1st Workshop on Taming Large Language Models: Controllability in the era of Interactive Assistants!

We present PandaGPT, an approach to emPower large lANguage moDels with visual and Auditory instruction-following capabilities. Our pilot experiments show that PandaGPT can perform complex tasks such as detailed image description generation, writing stories inspired by videos, and answering questions about audios. More interestingly, PandaGPT can take multimodal inputs simultaneously and compose their semantics naturally. For example, PandaGPT can connect how objects look in an image/video and how they sound in an audio. To do so, PandaGPT combines the multimodal encoders from ImageBind and the large language models from Vicuna. Notably, only aligned image-text pairs are required for the training of PandaGPT. Thanks to the strong capability of ImageBind in embedding data from different modalities into the same space, PandaGPT displays emergent, i.e. zero-shot, cross-modal behaviors for data other than image and text (e.g., video, audio, depth, thermal, and IMU). We hope that PandaGPT serves as an initial step toward building AGI that can perceive and understand inputs in different modalities holistically, as we humans do.

2022

pdf bib
Cross-Lingual Phrase Retrieval
Heqi Zheng | Xiao Zhang | Zewen Chi | Heyan Huang | Yan Tan | Tian Lan | Wei Wei | Xian-Ling Mao
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose , a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences. Moreover, we create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs. Experimental results show that outperforms state-of-the-art baselines which utilize word-level or sentence-level representations. also shows impressive zero-shot transferability that enables the model to perform retrieval in an unseen language pair during training. Our dataset, code, and trained models are publicly available at github.com/cwszz/XPR/.

pdf bib
TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning
Yixuan Su | Fangyu Liu | Zaiqiao Meng | Tian Lan | Lei Shu | Ehsan Shareghi | Nigel Collier
Findings of the Association for Computational Linguistics: NAACL 2022

Masked language models (MLMs) such as BERT have revolutionized the field of Natural Language Understanding in the past few years. However, existing pre-trained MLMs often output an anisotropic distribution of token representations that occupies a narrow subset of the entire representation space. Such token representations are not ideal, especially for tasks that demand discriminative semantic meanings of distinct tokens. In this work, we propose TaCL (Token-aware Contrastive Learning), a novel continual pre-training approach that encourages BERT to learn an isotropic and discriminative distribution of token representations. TaCL is fully unsupervised and requires no additional data. We extensively test our approach on a wide range of English and Chinese benchmarks. The results show that TaCL brings consistent and notable improvements over the original BERT model. Furthermore, we conduct detailed analysis to reveal the merits and inner-workings of our approach.

2021

pdf bib
ISTIC’s Triangular Machine Translation System for WMT2021
Hangcheng Guo | Wenbin Liu | Yanqing He | Tian Lan | Hongjiao Xu | Zhenfeng Wu | You Pan
Proceedings of the Sixth Conference on Machine Translation

This paper describes the ISTIC’s submission to the Triangular Machine Translation Task of Russian-to-Chinese machine translation for WMT’ 2021. In order to fully utilize the provided corpora and promote the translation performance from Russian to Chinese, the pivot method is used in our system which pipelines the Russian-to-English translator and the English-to-Chinese translator to form a Russian-to-Chinese translator. Our system is based on the Transformer architecture and several effective strategies are adopted to improve the quality of translation, including corpus filtering, data pre-processing, system combination and model ensemble.