Li Li


2022

pdf bib
CODE-MVP: Learning to Represent Source Code from Multiple Views with Contrastive Pre-Training
Xin Wang | Yasheng Wang | Yao Wan | Jiawei Wang | Pingyi Zhou | Li Li | Hao Wu | Jin Liu
Findings of the Association for Computational Linguistics: NAACL 2022

Recent years have witnessed increasing interest in code representation learning, which aims to represent the semantics of source code into distributed vectors. Currently, various works have been proposed to represent the complex semantics of source code from different views, including plain text, Abstract Syntax Tree (AST), and several kinds of code graphs (e.g., Control/Data Flow Graph). However, most of them only consider a single view of source code independently, ignoring the correspondences among different views. In this paper, we propose to integrate different views with the natural-language description of source code into a unified framework with Multi-View contrastive Pre-training, and name our model as CODE-MVP. Specifically, we first extract multiple code views using compiler tools, and learn the complementary information among them under a contrastive learning framework. Inspired by the type checking in compilation, we also design a fine-grained type inference objective in the pre-training. Experiments on three downstream tasks over five datasets demonstrate the superiority of CODE-MVP when compared with several state-of-the-art baselines. For example, we achieve 2.4/2.3/1.1 gain in terms of MRR/MAP/Accuracy metrics on natural language code retrieval, code similarity, and code defect detection tasks, respectively.

pdf bib
Modeling Hierarchical Syntax Structure with Triplet Position for Source Code Summarization
Juncai Guo | Jin Liu | Yao Wan | Li Li | Pingyi Zhou
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Automatic code summarization, which aims to describe the source code in natural language, has become an essential task in software maintenance. Our fellow researchers have attempted to achieve such a purpose through various machine learning-based approaches. One key challenge keeping these approaches from being practical lies in the lacking of retaining the semantic structure of source code, which has unfortunately been overlooked by the state-of-the-art. Existing approaches resort to representing the syntax structure of code by modeling the Abstract Syntax Trees (ASTs). However, the hierarchical structures of ASTs have not been well explored. In this paper, we propose CODESCRIBE to model the hierarchical syntax structure of code by introducing a novel triplet position for code summarization. Specifically, CODESCRIBE leverages the graph neural network and Transformer to preserve the structural and sequential information of code, respectively. In addition, we propose a pointer-generator network that pays attention to both the structure and sequential tokens of code for a better summary generation. Experiments on two real-world datasets in Java and Python demonstrate the effectiveness of our proposed approach when compared with several state-of-the-art baselines.

2021

pdf bib
Fast and Accurate Neural Machine Translation with Translation Memory
Qiuxiang He | Guoping Huang | Qu Cui | Li Li | Lemao Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

It is generally believed that a translation memory (TM) should be beneficial for machine translation tasks. Unfortunately, existing wisdom demonstrates the superiority of TM-based neural machine translation (NMT) only on the TM-specialized translation tasks rather than general tasks, with a non-negligible computational overhead. In this paper, we propose a fast and accurate approach to TM-based NMT within the Transformer framework: the model architecture is simple and employs a single bilingual sentence as its TM, leading to efficient training and inference; and its parameters are effectively optimized through a novel training criterion. Extensive experiments on six TM-specialized tasks show that the proposed approach substantially surpasses several strong baselines that use multiple TMs, in terms of BLEU and running time. In particular, the proposed approach also advances the strong baselines on two general tasks (WMT news Zh->En and En->De).

2015

pdf bib
Multi-label Text Categorization with Joint Learning Predictions-as-Features Method
Li Li | Houfeng Wang | Xu Sun | Baobao Chang | Shi Zhao | Lei Sha
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Predicting Chinese Abbreviations with Minimum Semantic Unit and Global Constraints
Longkai Zhang | Li Li | Houfeng Wang | Xu Sun
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Muli-label Text Categorization with Hidden Components
Li Li | Longkai Zhang | Houfeng Wang
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations
Longkai Zhang | Li Li | Zhengyan He | Houfeng Wang | Ni Sun
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2010

pdf bib
Person Name Disambiguation based on Topic Model
Jiashen Sun | Tianmin Wang | Li Li | Xing Wu
CIPS-SIGHAN Joint Conference on Chinese Language Processing

2002

pdf bib
An Indexing Method Based on Sentences
Li Li | Chunfa Yuan | K.F. Wong | Wenjie Li
COLING-02: The First SIGHAN Workshop on Chinese Language Processing

1998

pdf bib
A Test Environment for Natural Language Understanding Systems
Li Li | Deborah A. Dahl | Lewis M. Norton | Marcia C. Linebarger | Dongdong Chen
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf bib
Integration of Large-Scale Linguistic Resources in a Natural Language Understanding System
Lewis M. Norton | Deborah A. Dahl | Li Li | Katharine P. Beals
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf bib
A Test Environment for Natural Language Understanding Systems
Li Li | Deborah A. Dahl | Lewis M. Norton | Marcia C. Linebarger | Dongdong Chen
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf bib
Integration of Large-Scale Linguistic Resources in a Natural Language Understanding System
Lewis M. Norton | Deborah A. Dahl | Li Li | Katharine P. Beals
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

1997

pdf bib
NL Assistant: A Toolkit for Developing Natural Language: Applications
Deborah A. Dahl | Lewis M. Norton | Ahmed Bouzid | Li Li
Fifth Conference on Applied Natural Language Processing: Descriptions of System Demonstrations and Videos