Nuo Chen


pdf bib
When Gradient Descent Meets Derivative-Free Optimization: A Match Made in Black-Box Scenario
Chengcheng Han | Liqing Cui | Renyu Zhu | Jianing Wang | Nuo Chen | Qiushi Sun | Xiang Li | Ming Gao
Findings of the Association for Computational Linguistics: ACL 2023

Large pre-trained language models (PLMs) have garnered significant attention for their versatility and potential for solving a wide spectrum of natural language processing (NLP) tasks. However, the cost of running these PLMs may be prohibitive. Furthermore, PLMs may not be open-sourced due to commercial considerations and potential risks of misuse, such as GPT-3. The parameters and gradients of PLMs are unavailable in this scenario. To solve the issue, black-box tuning has been proposed, which utilizes derivative-free optimization (DFO), instead of gradient descent, for training task-specific continuous prompts. However, these gradient-free methods still exhibit a significant gap compared to gradient-based methods. In this paper, we introduce gradient descent into black-box tuning scenario through knowledge distillation. Furthermore, we propose a novel method GDFO, which integrates gradient descent and derivative-free optimization to optimize task-specific continuous prompts in a harmonized manner. Experimental results show that GDFO can achieve significant performance gains over previous state-of-the-art methods.

pdf bib
Structural Contrastive Pretraining for Cross-Lingual Comprehension
Nuo Chen | Linjun Shou | Tengtao Song | Ming Gong | Jian Pei | Jianhui Chang | Daxin Jiang | Jia Li
Findings of the Association for Computational Linguistics: ACL 2023

To present, multilingual language models trained using various pre-training tasks like mask language modeling (MLM) have yielded encouraging results on a wide range of downstream tasks. Despite the promising performances, structural knowledge in cross-lingual corpus is less explored in current works, leading to the semantic misalignment. In this paper, we propose a new pre-training task named Structural Contrast Pretraining (SCP) to align the structural words in a parallel sentence, enhancing the models’ ability to comprehend cross-lingual representations. Concretely, each structural word in source and target languages is regarded as a positive pair in SCP. Since contrastive learning compares positive and negative pairs, an increase in the frequency of negative pairings could enhance the performance of the resulting model. Therefore, we further propose Cross-lingual Momentum Contrast (CL-MoCo) to increase the number of negative pairs by maintaining a large size of the queue. CL-MoCo extends the original Moco approach into cross-lingual training and jointly optimizes the source-to-target language and target-to-source language representations, resulting in a more suitable encoder for cross-lingual transfer. We conduct extensive experiments to validate the proposed approach on three cross-lingual tasks across five datasets such as MLQA, WikiAnn, etc, and results prove the effectiveness of our method.

pdf bib
Pass-Tuning: Towards Structure-Aware Parameter-Efficient Tuning for Code Representation Learning
Nuo Chen | Qiushi Sun | Jianing Wang | Xiang Li | Ming Gao
Findings of the Association for Computational Linguistics: EMNLP 2023

Code pre-trained models (CodePTMs) have recently become the de-facto paradigm for various tasks in the domain of code intelligence. To achieve excellent performance, the widely used strategy is to fine-tune all the parameters of CodePTMs. However, as the model size increases along with the number of downstream tasks, this strategy becomes excessively expensive. There are also some prior works that utilize Parameter-Efficient Learning (PEL) methods for model tuning in natural language processing to mitigate similar problems, but applying them directly to CodePTMs fails to capture the inherent structural characteristics of codes. To address the problem, in this paper, we propose Pass-Tuning for structure-aware Parameter-Efficient code representation learning. Specifically, a plug-and-play graph neural network module that can learn from Abstract Syntax Tree (AST) is employed as a tunable prefix. On the one hand, Pass-Tuning can further exploit the structural information of source code. On the other hand, it could serve as a replacement for full fine-tuning. We evaluate our method on multiple tasks across eight programming languages, including code understanding and generation. These results demonstrate the effectiveness, robustness, and universality of our method.

pdf bib
Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised Language Understanding
Jianing Wang | Qiushi Sun | Nuo Chen | Chengyu Wang | Jun Huang | Ming Gao | Xiang Li
Findings of the Association for Computational Linguistics: EMNLP 2023

The recent success of large pre-trained language models (PLMs) heavily hinges on massive labeled data, which typically produces inferior performance in low-resource scenarios. To remedy this dilemma, we study self-training as one of the predominant semi-supervised learning (SSL) approaches, which utilizes large-scale unlabeled data to generate synthetic examples. However, too many noisy labels will hurt the model performance, and the self-training procedure requires multiple training iterations making it more expensive if all the model parameters of the PLM are updated. This paper presents UPET, a novel Uncertainty-aware Parameter-Efficient self-Training framework to effectively and efficiently address the labeled data scarcity issue. Specifically, we incorporate Monte Carlo (MC) dropout in Bayesian neural network (BNN) to perform uncertainty estimation for the teacher model and then judiciously select reliable pseudo-labeled examples based on confidence and certainty. During the student training, we introduce multiple parameter-efficient learning (PEL) paradigms that allow optimizes only a small percentage of parameters. We also propose a novel Easy-Hard Contrastive Tuning to enhance the robustness and generalization. Extensive experiments over multiple downstream tasks demonstrate that UPET achieves a substantial improvement in terms of performance and efficiency. Our codes and data are released at https: //

pdf bib
Large Language Models Meet Harry Potter: A Dataset for Aligning Dialogue Agents with Characters
Nuo Chen | Yan Wang | Haiyun Jiang | Deng Cai | Yuhan Li | Ziyang Chen | Longyue Wang | Jia Li
Findings of the Association for Computational Linguistics: EMNLP 2023

In recent years, Dialogue-style Large Language Models (LLMs) such as ChatGPT and GPT4 have demonstrated immense potential in constructing open-domain dialogue agents. However, aligning these agents with specific characters or individuals remains a considerable challenge due to the complexities of character representation and the lack of comprehensive annotations. In this paper, we introduce the Harry Potter Dialogue (HPD) dataset, designed to advance the study of dialogue agents and character alignment. The dataset encompasses all dialogue sessions (in both English and Chinese) from the Harry Potter series and is annotated with vital background information, including dialogue scenes, speakers, character relationships, and attributes. These extensive annotations may empower LLMs to unlock character-driven dialogue capabilities. Furthermore, it can serve as a universal benchmark for evaluating how well can a LLM aligning with a specific character. We benchmark LLMs on HPD using both fine-tuning and in-context learning settings. Evaluation results reveal that although there is substantial room for improvement in generating high-quality, character-aligned responses, the proposed dataset is valuable in guiding models toward responses that better align with the character of Harry Potter.

pdf bib
Natural Response Generation for Chinese Reading Comprehension
Nuo Chen | Hongguang Li | Yinan Bao | Baoyuan Wang | Jia Li
Findings of the Association for Computational Linguistics: EMNLP 2023

Machine reading comprehension (MRC) is an important area of conversation agents and draws a lot of attention. However, there is a notable limitation to current MRC benchmarks: The labeled answers are mostly either spans extracted from the target corpus or the choices of the given candidates, ignoring the natural aspect of high-quality responses. As a result, MRC models trained on these datasets can not generate human-like responses in real QA scenarios. To this end, we construct a new dataset called Penguin to promote the research of MRC, providing a training and test bed for natural response generation to real scenarios. Concretely, Penguin consists of 200k training data with high-quality fluent, and well-informed responses. Penguin is the first benchmark towards natural response generation in Chinese MRC on a relatively large scale. To address the challenges in Penguin, we develop two strong baselines: end-to-end and two-stage frameworks. Following that, we further design Prompt-BART: fine-tuning the pre-trained generative language models with a mixture of prefix prompts in Penguin. Extensive experiments validated the effectiveness of this design.

pdf bib
Evaluating and Enhancing the Robustness of Code Pre-trained Models through Structure-Aware Adversarial Samples Generation
Nuo Chen | Qiushi Sun | Jianing Wang | Ming Gao | Xiaoli Li | Xiang Li
Findings of the Association for Computational Linguistics: EMNLP 2023

Code pre-trained models (CodePTMs) have significantly advanced the field of neural code intelligence. Despite their capabilities, these models are susceptible to adversarial attacks that subtly modify the model inputs, resulting in incorrect outputs or predictions. Previous methods of robustness evaluation for CodePTMs primarily stem from a textual perspective, without explicitly taking into account the structure of the code. Furthermore, prior studies fail to encompass a broad enough spectrum of tasks and models. In this paper, we propose a set of novel robustness evaluation methods based on the intrinsic structure of the code. Specifically, we first launch adversarial attacks on crucial identifier tokens and sub-tree structures to explore the impact of imperceptible perturbation. Then, we perform global restructuring of the code using different traversal methods for abstract syntax trees, aiming to explore the model’s sensitivity to input samples with equivalent information. Moreover, for each scenario, we employ adversarial training methods to explore the possibility of restoring the performance of perturbed models. For both code understanding and generation, our proposed method has demonstrated its effectiveness across a wide range of models and tasks, thereby allowing us to make one step forward in our understanding of the inner mechanisms of CodePTMs.

pdf bib
Orca: A Few-shot Benchmark for Chinese Conversational Machine Reading Comprehension
Nuo Chen | Hongguang Li | Junqing He | Yinan Bao | Xinshi Lin | Qi Yang | Jianfeng Liu | Ruyi Gan | Jiaxing Zhang | Baoyuan Wang | Jia Li
Findings of the Association for Computational Linguistics: EMNLP 2023

The conversational machine reading comprehension (CMRC) task aims to answer questions in conversations, which has been a hot research topic in recent years because of its wide applications. However, existing CMRC benchmarks in which each conversation is assigned a static passage are inconsistent with real scenarios. Thus, model’s comprehension ability towards real scenarios are hard to evaluate reasonably. To this end, we propose the first Chinese CMRC benchmark Orca and further provide zero-shot/few-shot settings to evaluate model’s generalization ability towards diverse domains. We collect 831 hot-topic driven conversations with 4,742 turns in total. Each turn of a conversation is assigned with a response-related passage, aiming to evaluate model’s comprehension ability more reasonably. The topics of conversations are collected from social media platform and cover 33 domains, trying to be consistent with real scenarios. Importantly, answers in Orca are all well-annotated natural responses rather than the specific spans or short phrase in previous datasets. Besides, we implement three strong baselines to tackle the challenge in Orca. The results indicate the great challenge of our CMRC benchmark.

pdf bib
Alleviating Over-smoothing for Unsupervised Sentence Representation
Nuo Chen | Linjun Shou | Jian Pei | Ming Gong | Bowen Cao | Jianhui Chang | Jia Li | Daxin Jiang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Currently, learning better unsupervised sentence representations is the pursuit of many natural language processing communities. Lots of approaches based on pre-trained language models (PLMs) and contrastive learning have achieved promising results on this task. Experimentally, we observe that the over-smoothing problem reduces the capacity of these powerful PLMs, leading to sub-optimal sentence representations. In this paper, we present a Simple method named Self-Contrastive Learning (SSCL) to alleviate this issue, which samples negatives from PLMs intermediate layers, improving the quality of the sentence representation. Our proposed method is quite simple and can be easily extended to various state-of-the-art models for performance boosting, which can be seen as a plug-and-play contrastive framework for learning unsupervised sentence representation. Extensive results prove that SSCL brings the superior performance improvements of different strong baselines (e.g., BERT and SimCSE) on Semantic Textual Similarity and Transfer datasets


pdf bib
A Transformer-based Threshold-Free Framework for Multi-Intent NLU
Lisung Chen | Nuo Chen | Yuexian Zou | Yong Wang | Xinzhong Sun
Proceedings of the 29th International Conference on Computational Linguistics

Multi-intent natural language understanding (NLU) has recently gained attention. It detects multiple intents in an utterance, which is better suited to real-world scenarios. However, the state-of-the-art joint NLU models mainly detect multiple intents on threshold-based strategy, resulting in one main issue: the model is extremely sensitive to the threshold settings. In this paper, we propose a transformer-based Threshold-Free Multi-intent NLU model (TFMN) with multi-task learning (MTL). Specifically, we first leverage multiple layers of a transformer-based encoder to generate multi-grain representations. Then we exploit the information of the number of multiple intents in each utterance without additional manual annotations and propose an auxiliary detection task: Intent Number detection (IND). Furthermore, we propose a threshold-free intent multi-intent classifier that utilizes the output of IND task and detects the multiple intents without depending on the threshold. Extensive experiments demonstrate that our proposed model achieves superior results on two public multi-intent datasets.

pdf bib
End-to-end Spoken Conversational Question Answering: Task, Dataset and Model
Chenyu You | Nuo Chen | Fenglin Liu | Shen Ge | Xian Wu | Yuexian Zou
Findings of the Association for Computational Linguistics: NAACL 2022

In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts. However, the most natural way that human seek or test their knowledge is via human conversations. Therefore, we propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogues flow given the speech documents. In this task, our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering. To this end, instead of directly adopting automatically generated speech transcripts with highly noisy data, we propose a novel unified data distillation approach, DDNet, which effectively ingests cross-modal information to achieve fine-grained representations of the speech and language modalities. Moreover, we propose a simple and novel mechanism, termed Dual Attention, by encouraging better alignments between audio and text to ease the process of knowledge transfer. To evaluate the capacity of SCQA systems in a dialogue-style interaction, we assemble a Spoken Conversational Question Answering (Spoken-CoQA) dataset with more than 40k question-answer pairs from 4k conversations. We first show that the performance of the existing state-of-the-art methods significantly degrade on our dataset, hence demonstrating the necessity of incorporating cross-modal information to achieve good performance gains. Our experimental results demonstrate that our proposed method achieves superior performance in spoken conversational question answering. Codes and datasets will be made publicly available.

pdf bib
CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure
Nuo Chen | Qiushi Sun | Renyu Zhu | Xiang Li | Xuesong Lu | Ming Gao
Findings of the Association for Computational Linguistics: EMNLP 2022

Code pre-trained models (CodePTMs) have recently demonstrated significant success in code intelligence. To interpret these models, some probing methods have been applied. However, these methods fail to consider the inherent characteristics of codes. In this paper, to address the problem, we propose a novel probing method CAT-probing to quantitatively interpret how CodePTMs attend code structure. We first denoise the input code sequences based on the token types pre-defined by the compilers to filter those tokens whose attention scores are too small. After that, we define a new metric CAT-score to measure the commonality between the token-level attention scores generated in CodePTMs and the pair-wise distances between corresponding AST nodes. The higher the CAT-score, the stronger the ability of CodePTMs to capture code structure. We conduct extensive experiments to integrate CAT-probing with representative CodePTMs for different programming languages. Experimental results show the effectiveness of CAT-probing in CodePTM interpretation. Our codes and data are publicly available at

pdf bib
Bridging the Gap between Language Models and Cross-Lingual Sequence Labeling
Nuo Chen | Linjun Shou | Ming Gong | Jian Pei | Daxin Jiang
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Large-scale cross-lingual pre-trained language models (xPLMs) have shown effective in cross-lingual sequence labeling tasks (xSL), such as machine reading comprehension (xMRC) by transferring knowledge from a high-resource language to low-resource languages. Despite the great success, we draw an empirical observation that there is an training objective gap between pre-training and fine-tuning stages: e.g., mask language modeling objective requires local understanding of the masked token and the span-extraction objective requires understanding and reasoning of the global input passage/paragraph and question, leading to the discrepancy between pre-training and xMRC. In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap in a self-supervised manner. Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel sequences via unsupervised cross-lingual instance-wise training signals during pre-training. By these means, our methods not only bridge the gap between pretrain-finetune, but also enhance PLMs to better capture the alignment between different languages. Extensive experiments prove that our method achieves clearly superior results on multiple xSL benchmarks with limited pre-training data. Our methods also surpass the previous state-of-the-art methods by a large margin in few-shot data setting, where only a few hundred training examples are available.


pdf bib
Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering
Chenyu You | Nuo Chen | Yuexian Zou
Findings of the Association for Computational Linguistics: EMNLP 2021

Spoken question answering (SQA) requires fine-grained understanding of both spoken documents and questions for the optimal answer prediction. In this paper, we propose novel training schemes for spoken question answering with a self-supervised training stage and a contrastive representation learning stage. In the self-supervised stage, we propose three auxiliary self-supervised tasks, including utterance restoration, utterance insertion, and question discrimination, and jointly train the model to capture consistency and coherence among speech documents without any additional data or annotations. We then propose to learn noise-invariant utterance representations in a contrastive objective by adopting multiple augmentation strategies, including span deletion and span substitution. Besides, we design a Temporal-Alignment attention to semantically align the speech-text clues in the learned common space and benefit the SQA tasks. By this means, the training schemes can more effectively guide the generation model to predict more proper answers. Experimental results show that our model achieves state-of-the-art results on three SQA benchmarks. Our code will be publicly available after publication.