Fang Liu


2024

pdf bib
A Survey on Natural Language Processing for Programming
Qingfu Zhu | Xianzhen Luo | Fang Liu | Cuiyun Gao | Wanxiang Che
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Natural language processing for programming aims to use NLP techniques to assist programming. It is increasingly prevalent for its effectiveness in improving productivity. Distinct from natural language, a programming language is highly structured and functional. Constructing a structure-based representation and a functionality-oriented algorithm is at the heart of program understanding and generation. In this paper, we conduct a systematic review covering tasks, datasets, evaluation methods, techniques, and models from the perspective of the structure-based and functionality-oriented property, aiming to understand the role of the two properties in each component. Based on the analysis, we illustrate unexplored areas and suggest potential directions for future work.

pdf bib
RAAMove: A Corpus for Analyzing Moves in Research Article Abstracts
Hongzheng Li | Ruojin Wang | Ge Shi | Xing Lv | Lei Lei | Chong Feng | Fang Liu | Jinkun Lin | Yangguang Mei | Linnan Xu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Move structures have been studied in English for Specific Purposes (ESP) and English for Academic Purposes (EAP) for decades. However, there are few move annotation corpora for Research Article (RA) abstracts. In this paper, we introduce RAAMove, a comprehensive multi-domain corpus dedicated to the annotation of move structures in RA abstracts. The primary objective of RAAMove is to facilitate move analysis and automatic move identification. This paper provides a thorough discussion of the corpus construction process, including the scheme, data collection, annotation guidelines, and annotation procedures. The corpus is constructed through two stages: initially, expert annotators manually annotate high-quality data; subsequently, based on the human-annotated data, a BERT-based model is employed for automatic annotation with the help of experts’ modification. The result is a large-scale and high-quality corpus comprising 33,988 annotated instances. We also conduct preliminary move identification experiments using the BERT-based model to verify the effectiveness of the proposed corpus and model. The annotated corpus is available for academic research purposes and can serve as essential resources for move analysis, English language teaching and writing, as well as move/discourse-related tasks in Natural Language Processing (NLP).

2018

pdf bib
Ling@CASS Solution to the NLP-TEA CGED Shared Task 2018
Qinan Hu | Yongwei Zhang | Fang Liu | Yueguo Gu
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

In this study, we employ the sequence to sequence learning to model the task of grammar error correction. The system takes potentially erroneous sentences as inputs, and outputs correct sentences. To breakthrough the bottlenecks of very limited size of manually labeled data, we adopt a semi-supervised approach. Specifically, we adapt correct sentences written by native Chinese speakers to generate pseudo grammatical errors made by learners of Chinese as a second language. We use the pseudo data to pre-train the model, and the CGED data to fine-tune it. Being aware of the significance of precision in a grammar error correction system in real scenarios, we use ensembles to boost precision. When using inputs as simple as Chinese characters, the ensembled system achieves a precision at 86.56% in the detection of erroneous sentences, and a precision at 51.53% in the correction of errors of Selection and Missing types.

pdf bib
CMMC-BDRC Solution to the NLP-TEA-2018 Chinese Grammatical Error Diagnosis Task
Yongwei Zhang | Qinan Hu | Fang Liu | Yueguo Gu
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

Chinese grammatical error diagnosis is an important natural language processing (NLP) task, which is also an important application using artificial intelligence technology in language education. This paper introduces a system developed by the Chinese Multilingual & Multimodal Corpus and Big Data Research Center for the NLP-TEA shared task, named Chinese Grammar Error Diagnosis (CGED). This system regards diagnosing errors task as a sequence tagging problem, while takes correction task as a text classification problem. Finally, in the 12 teams, this system gets the highest F1 score in the detection task and the second highest F1 score in mean in the identification task, position task and the correction task.

2013

pdf bib
Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization
Guangyou Zhou | Fang Liu | Yang Liu | Shizhu He | Jun Zhao
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Attribute Relation Extraction from Template-inconsistent Semi-structured Text by Leveraging Site-level Knowledge
Yang Liu | Fang Liu | Siwei Lai | Kang Liu | Guangyou Zhou | Jun Zhao
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2000

pdf bib
Statistics Based Hybrid Approach to Chinese Base Phrase Identification
Tie-jun Zhao | Mu-yun Yang | Fang Liu | Jian-min Yao | Hao Yu
Second Chinese Language Processing Workshop