Jingjing Xu


pdf bib
Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation
Fei Yuan | Yinquan Lu | Wenhao Zhu | Lingpeng Kong | Lei Li | Yu Qiao | Jingjing Xu
Findings of the Association for Computational Linguistics: ACL 2023

Multilingual neural machine translation (MNMT) aims to build a unified model for many language directions. Existing monolithic models for MNMT encounter two challenges: parameter interference among languages and inefficient inference for large models. In this paper, we revisit the classic multi-way structures and develop a detachable model by assigning each language (or group of languages) to an individual branch that supports plug-and-play training and inference. To address the needs of learning representations for all languages in a unified space, we propose a novel efficient training recipe, upon which we build an effective detachable model, Lego-MT.For a fair comparison, we collect data from OPUS and build a translation benchmark covering 433 languages and 1.3B parallel data. Experiments show that Lego-MT with 1.2B parameters brings an average gain of 3.2 spBLEU. It even outperforms M2M-100 with 12B parameters. The proposed training recipe brings a 28.2× speedup over the conventional multi-way training method.code and data repo: https://github.com/CONE-MT/Lego-MT.git.

pdf bib
ImageNetVC: Zero- and Few-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories
Heming Xia | Qingxiu Dong | Lei Li | Jingjing Xu | Tianyu Liu | Ziwei Qin | Zhifang Sui
Findings of the Association for Computational Linguistics: EMNLP 2023

Recently, Large Language Models (LLMs) have been serving as general-purpose interfaces, posing a significant demand for comprehensive visual knowledge. However, it remains unclear how well current LLMs and their visually augmented counterparts (VaLMs) can master visual commonsense knowledge. To investigate this, we propose ImageNetVC, a human-annotated dataset specifically designed for zero- and few-shot visual commonsense evaluation across 1,000 ImageNet categories. Utilizing ImageNetVC, we benchmark the fundamental visual commonsense knowledge of both unimodal LLMs and VaLMs. Furthermore, we analyze the factors affecting the visual commonsense knowledge of large-scale models, providing insights into the development of language models enriched with visual commonsense knowledge. Our code and dataset are available at https://github.com/hemingkx/ImageNetVC.

pdf bib
Extrapolating Multilingual Understanding Models as Multilingual Generators
Bohong Wu | Fei Yuan | Hai Zhao | Lei Li | Jingjing Xu
Findings of the Association for Computational Linguistics: EMNLP 2023

Multilingual understanding models (or encoder-based), pre-trained via masked language modeling, have achieved promising results on many language understanding tasks (e.g., mBERT). However, these models are not capable of generating high-quality text compared with decoder-based causal language models. Can we transform a pre-trained language understanding model into an effective language generation model? We propose a Semantic-Guided Alignment-then-Denoising (SGA) approach to adapt a multilingual encoder to a multilingual generator with a small number of additional parameters. Experiments show that the proposed approach is an effective adaption method, outperforming widely-used initialization-based methods with gains of 9.4 BLEU on machine translation, 8.1 Rouge-L on question generation, and 5.5 METEOR on story generation on XLM-Rlarge. On the other hand, we observe that XLM-R is still inferior to mBART in supervised settings despite better results on zero-shot settings, indicating that more exploration is required to make understanding models strong generators. Our code is available at https://github.com/chengzhipanpan/XLMR4MT.

pdf bib
Can We Edit Factual Knowledge by In-Context Learning?
Ce Zheng | Lei Li | Qingxiu Dong | Yuxuan Fan | Zhiyong Wu | Jingjing Xu | Baobao Chang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Previous studies have shown that large language models (LLMs) like GPTs store massive factual knowledge in their parameters. However, the stored knowledge could be false or outdated. Traditional knowledge editing methods refine LLMs via fine-tuning on texts containing specific knowledge. However, with the increasing scales of LLMs, these gradient-based approaches bring large computation costs. The trend of model-as-a-service also makes it impossible to modify knowledge in black-box LMs. Inspired by in-context learning (ICL), a new paradigm based on demonstration contexts without parameter updating, we explore whether ICL can edit factual knowledge. To answer this question, we give a comprehensive empirical study of ICL strategies. Experiments show that in-context knowledge editing (IKE), without any gradient and parameter updating, achieves a competitive success rate compared to gradient-based methods on GPT-J (6B) but with much fewer side effects, including less over-editing on similar but unrelated facts and less knowledge forgetting on previously stored knowledge. We also apply the method to larger LMs with tens or hundreds of parameters like OPT-175B, which shows the scalability of our method. The code is available at https://github.com/pkunlp-icler/IKE.

pdf bib
Can Language Models Understand Physical Concepts?
Lei Li | Jingjing Xu | Qingxiu Dong | Ce Zheng | Xu Sun | Lingpeng Kong | Qi Liu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Language models (LMs) gradually become general-purpose interfaces in the interactive and embodied world, where the understanding of physical concepts is an essential prerequisite. However, it is unclear whether LMs can understand physical concepts in the human world. To investigate this, we design a benchmark VEC that covers the tasks of (i) Visual concepts, such as the shape and material of objects, and (ii) Embodied Concepts, learned from the interaction with the world such as the temperature of objects. Our zero (few)-shot prompting results show that the understanding of certain visual concepts emerges as scaling up LMs, but there are still basic concepts to which the scaling law does not apply. For example, OPT-175B performs close to humans with a zero-shot accuracy of 85% on the material concept, yet behaves like random guessing on the mass concept. Instead, vision-augmented LMs such as CLIP and BLIP achieve a human-level understanding of embodied concepts. Analysis indicates that the rich semantics in visual representation can serve as a valuable source of embodied knowledge. Inspired by this, we propose a distillation method to transfer embodied knowledge from VLMs to LMs, achieving performance gain comparable with that by scaling up parameters of LMs 134×. Our dataset is available at https://github.com/TobiasLee/VEC.

pdf bib
INK: Injecting kNN Knowledge in Nearest Neighbor Machine Translation
Wenhao Zhu | Jingjing Xu | Shujian Huang | Lingpeng Kong | Jiajun Chen
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Neural machine translation has achieved promising results on many translation tasks. However, previous studies have shown that neural models induce a non-smooth representation space, which harms its generalization results. Recently, kNN-MT has provided an effective paradigm to smooth the prediction based on neighbor representations during inference. Despite promising results, kNN-MT usually requires large inference overhead. We propose an effective training framework INK to directly smooth the representation space via adjusting representations of kNN neighbors with a small number of new parameters. The new parameters are then used to refresh the whole representation datastore to get new kNN knowledge asynchronously. This loop keeps running until convergence. Experiments on four benchmark datasets show that INK achieves average gains of 1.99 COMET and 1.0 BLEU, outperforming the state-of-the-art kNN-MT system with 0.02x memory space and 1.9x inference speedup.

pdf bib
OpenICL: An Open-Source Framework for In-context Learning
Zhenyu Wu | Yaoxiang Wang | Jiacheng Ye | Zhiyong Wu | Jiangtao Feng | Jingjing Xu | Yu Qiao
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

In recent years, In-context Learning (ICL) has gained increasing attentionand emerged as the new paradigm for large language model (LLM) evaluation. Unlike traditional fine-tuning methods, ICL instead adapts the pre-trained models to unseen tasks without any parameter updates. However, the implementation of ICL is sophisticated due to the diverse retrieval and inference methods involved, as well as the varying pre-processing requirements for different models, datasets, and tasks. A unified and flexible framework for ICL is urgently needed to ease the implementation of the aforementioned components. To facilitate ICL research, we introduce OpenICL, an open-source toolkit for ICL and LLM evaluation. OpenICL is research-friendly with a highly flexible architecture that users can easily combine different components to suit their needs. It also provides various state-of-the-art retrieval and inference methods to streamline the process of adapting ICL to cutting-edge research. The effectiveness of OpenICL has been validated on a wide range of NLP tasks, including classification, QA, machine translation, and semantic parsing. As a side-product, we found OpenICL to be an efficient yet robust tool for LLMs evaluation. OpenICL is released at https://github.com/Shark-NLP/OpenICL.


pdf bib
Contextual Representation Learning beyond Masked Language Modeling
Zhiyi Fu | Wangchunshu Zhou | Jingjing Xu | Hao Zhou | Lei Li
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Currently, masked language modeling (e.g., BERT) is the prime choice to learn contextualized representations. Due to the pervasiveness, it naturally raises an interesting question: how do masked language models (MLMs) learn contextual representations? In this work, we analyze the learning dynamics of MLMs and find that it adopts sampled embeddings as anchors to estimate and inject contextual semantics to representations, which limits the efficiency and effectiveness of MLMs. To address these problems, we propose TACO, a simple yet effective representation learning approach to directly model global semantics. To be specific, TACO extracts and aligns contextual semantics hidden in contextualized representations to encourage models to attend global semantics when generating contextualized representations. Experiments on the GLUE benchmark show that TACO achieves up to 5x speedup and up to 1.2 points average improvement over MLM.

pdf bib
MTG: A Benchmark Suite for Multilingual Text Generation
Yiran Chen | Zhenqiao Song | Xianze Wu | Danqing Wang | Jingjing Xu | Jiaze Chen | Hao Zhou | Lei Li
Findings of the Association for Computational Linguistics: NAACL 2022

We introduce MTG, a new benchmark suite for training and evaluating multilingual text generation. It is the first-proposed multilingual multiway text generation dataset with the largest human-annotated data (400k). It includes four generation tasks (story generation, question generation, title generation and text summarization) across five languages (English, German, French, Spanish and Chinese). The multiway setup enables testing knowledge transfer capabilities for a model across languages and tasks. Using MTG, we train and analyze several popular multilingual generation models from different aspects. Our benchmark suite fosters model performance enhancement with more human-annotated parallel data. It provides comprehensive evaluations with diverse generation scenarios. Code and data are available at https://github.com/zide05/MTG.

pdf bib
Calibrating Factual Knowledge in Pretrained Language Models
Qingxiu Dong | Damai Dai | Yifan Song | Jingjing Xu | Zhifang Sui | Lei Li
Findings of the Association for Computational Linguistics: EMNLP 2022

Previous literature has proved that Pretrained Language Models (PLMs) can store factual knowledge. However, we find that facts stored in the PLMs are not always correct. It motivates us to explore a fundamental question: How do we calibrate factual knowledge in PLMs without re-training from scratch? In this work, we propose a simple and lightweight method CaliNet to achieve this goal. To be specific, we first detect whether PLMs can learn the right facts via a contrastive score between right and fake facts. If not, we then use a lightweight method to add and adapt new parameters to specific factual texts. Experiments on the knowledge probing task show the calibration effectiveness and efficiency. In addition, through closed-book question answering, we find that the calibrated PLM possesses knowledge generalization ability after finetuning.Beyond the calibration performance, we further investigate and visualize the knowledge calibration mechanism.


pdf bib
Vocabulary Learning via Optimal Transport for Neural Machine Translation
Jingjing Xu | Hao Zhou | Chun Gan | Zaixiang Zheng | Lei Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The choice of token vocabulary affects the performance of machine translation. This paper aims to figure out what is a good vocabulary and whether we can find the optimal vocabulary without trial training. To answer these questions, we first provide an alternative understanding of vocabulary from the perspective of information theory. It motivates us to formulate the quest of vocabularization – finding the best token dictionary with a proper size – as an optimal transport (OT) problem. We propose VOLT, a simple and efficient solution without trial training. Empirical results show that VOLT beats widely-used vocabularies in diverse scenarios, including WMT-14 English-German translation, TED bilingual translation, and TED multilingual translation. For example, VOLT achieves 70% vocabulary size reduction and 0.5 BLEU gain on English-German translation. Also, compared to BPE-search, VOLT reduces the search time from 384 GPU hours to 30 GPU hours on English-German translation. Codes are available at https://github.com/Jingjing-NLP/VOLT.


pdf bib
Reasoning Over Semantic-Level Graph for Fact Checking
Wanjun Zhong | Jingjing Xu | Duyu Tang | Zenan Xu | Nan Duan | Ming Zhou | Jiahai Wang | Jian Yin
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Fact checking is a challenging task because verifying the truthfulness of a claim requires reasoning about multiple retrievable evidence. In this work, we present a method suitable for reasoning about the semantic-level structure of evidence. Unlike most previous works, which typically represent evidence sentences with either string concatenation or fusing the features of isolated evidence sentences, our approach operates on rich semantic structures of evidence obtained by semantic role labeling. We propose two mechanisms to exploit the structure of evidence while leveraging the advances of pre-trained models like BERT, GPT or XLNet. Specifically, using XLNet as the backbone, we first utilize the graph structure to re-define the relative distances of words, with the intuition that semantically related words should have short distances. Then, we adopt graph convolutional network and graph attention network to propagate and aggregate information from neighboring nodes on the graph. We evaluate our system on FEVER, a benchmark dataset for fact checking, and find that rich structural information is helpful and both our graph-based mechanisms improve the accuracy. Our model is the state-of-the-art system in terms of both official evaluation metrics, namely claim verification accuracy and FEVER score.


pdf bib
Review-Driven Multi-Label Music Style Classification by Exploiting Style Correlations
Guangxiang Zhao | Jingjing Xu | Qi Zeng | Xuancheng Ren | Xu Sun
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

This paper explores a new natural languageprocessing task, review-driven multi-label musicstyle classification. This task requires systemsto identify multiple styles of music basedon its reviews on websites. The biggest challengelies in the complicated relations of musicstyles. To tackle this problem, we proposea novel deep learning approach to automaticallylearn and exploit style correlations. Experiment results show that our approachachieves large improvements over baselines onthe proposed dataset. Furthermore, the visualizedanalysis shows that our approach performswell in capturing style correlations.

pdf bib
Asking Clarification Questions in Knowledge-Based Question Answering
Jingjing Xu | Yuechen Wang | Duyu Tang | Nan Duan | Pengcheng Yang | Qi Zeng | Ming Zhou | Xu Sun
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The ability to ask clarification questions is essential for knowledge-based question answering (KBQA) systems, especially for handling ambiguous phenomena. Despite its importance, clarification has not been well explored in current KBQA systems. Further progress requires supervised resources for training and evaluation, and powerful models for clarification-related text understanding and generation. In this paper, we construct a new clarification dataset, CLAQUA, with nearly 40K open-domain examples. The dataset supports three serial tasks: given a question, identify whether clarification is needed; if yes, generate a clarification question; then predict answers base on external user feedback. We provide representative baselines for these tasks and further introduce a coarse-to-fine model for clarification question generation. Experiments show that the proposed model achieves better performance than strong baselines. The further analysis demonstrates that our dataset brings new challenges and there still remain several unsolved problems, like reasonable automatic evaluation metrics for clarification question generation and powerful models for handling entity sparsity.

pdf bib
Specificity-Driven Cascading Approach for Unsupervised Sentiment Modification
Pengcheng Yang | Junyang Lin | Jingjing Xu | Jun Xie | Qi Su | Xu Sun
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The task of unsupervised sentiment modification aims to reverse the sentiment polarity of the input text while preserving its semantic content without any parallel data. Most previous work follows a two-step process. They first separate the content from the original sentiment, and then directly generate text with the target sentiment only based on the content produced by the first step. However, the second step bears both the target sentiment addition and content reconstruction, thus resulting in a lack of specific information like proper nouns in the generated text. To remedy this, we propose a specificity-driven cascading approach in this work, which can effectively increase the specificity of the generated text and further improve content preservation. In addition, we propose a more reasonable metric to evaluate sentiment modification. The experiments show that our approach outperforms competitive baselines by a large margin, which achieves 11% and 38% relative improvements of the overall metric on the Yelp and Amazon datasets, respectively.

pdf bib
LexicalAT: Lexical-Based Adversarial Reinforcement Training for Robust Sentiment Classification
Jingjing Xu | Liang Zhao | Hanqi Yan | Qi Zeng | Yun Liang | Xu Sun
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Recent work has shown that current text classification models are fragile and sensitive to simple perturbations. In this work, we propose a novel adversarial training approach, LexicalAT, to improve the robustness of current classification models. The proposed approach consists of a generator and a classifier. The generator learns to generate examples to attack the classifier while the classifier learns to defend these attacks. Considering the diversity of attacks, the generator uses a large-scale lexical knowledge base, WordNet, to generate attacking examples by replacing some words in training examples with their synonyms (e.g., sad and unhappy), neighbor words (e.g., fox and wolf), or super-superior words (e.g., chair and armchair). Due to the discrete generation step in the generator, we use policy gradient, a reinforcement learning approach, to train the two modules. Experiments show LexicalAT outperforms strong baselines and reduces test errors on various neural networks, including CNN, RNN, and BERT.

pdf bib
Coherent Comments Generation for Chinese Articles with a Graph-to-Sequence Model
Wei Li | Jingjing Xu | Yancheng He | ShengLi Yan | Yunfang Wu | Xu Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Automatic article commenting is helpful in encouraging user engagement on online news platforms. However, the news documents are usually too long for models under traditional encoder-decoder frameworks, which often results in general and irrelevant comments. In this paper, we propose to generate comments with a graph-to-sequence model that models the input news as a topic interaction graph. By organizing the article into graph structure, our model can better understand the internal structure of the article and the connection between topics, which makes it better able to generate coherent and informative comments. We collect and release a large scale news-comment corpus from a popular Chinese online news platform Tencent Kuaibao. Extensive experiment results show that our model can generate much more coherent and informative comments compared with several strong baseline models.


pdf bib
An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation
Liangchen Luo | Jingjing Xu | Junyang Lin | Qi Zeng | Xu Sun
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Generating semantically coherent responses is still a major challenge in dialogue generation. Different from conventional text generation tasks, the mapping between inputs and responses in conversations is more complicated, which highly demands the understanding of utterance-level semantic dependency, a relation between the whole meanings of inputs and outputs. To address this problem, we propose an Auto-Encoder Matching (AEM) model to learn such dependency. The model contains two auto-encoders and one mapping module. The auto-encoders learn the semantic representations of inputs and responses, and the mapping module learns to connect the utterance-level representations. Experimental results from automatic and human evaluations demonstrate that our model is capable of generating responses of high coherence and fluency compared to baseline models.

pdf bib
Learning Sentiment Memories for Sentiment Modification without Parallel Data
Yi Zhang | Jingjing Xu | Pengcheng Yang | Xu Sun
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

The task of sentiment modification requires reversing the sentiment of the input and preserving the sentiment-independent content. However, aligned sentences with the same content but different sentiments are usually unavailable. Due to the lack of such parallel data, it is hard to extract sentiment independent content and reverse the sentiment in an unsupervised way. Previous work usually can not reconcile sentiment transformation and content preservation. In this paper, motivated by the fact the non-emotional context (e.g., “staff”) provides strong cues for the occurrence of emotional words (e.g., “friendly”), we propose a novel method that automatically extracts appropriate sentiment information from learned sentiment memories according to the specific context. Experiments show that our method substantially improves the content preservation degree and achieves the state-of-the-art performance.

pdf bib
Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation
Jingjing Xu | Xuancheng Ren | Junyang Lin | Xu Sun
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Existing text generation methods tend to produce repeated and ”boring” expressions. To tackle this problem, we propose a new text generation model, called Diversity-Promoting Generative Adversarial Network (DP-GAN). The proposed model assigns low reward for repeatedly generated text and high reward for ”novel” and fluent text, encouraging the generator to produce diverse and informative text. Moreover, we propose a novel language-model based discriminator, which can better distinguish novel text from repeated text without the saturation problem compared with existing classifier-based discriminators. The experimental results on review generation and dialogue generation tasks demonstrate that our model can generate substantially more diverse and informative text than existing baselines.

pdf bib
A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation
Jingjing Xu | Xuancheng Ren | Yi Zhang | Qi Zeng | Xiaoyan Cai | Xu Sun
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Narrative story generation is a challenging problem because it demands the generated sentences with tight semantic connections, which has not been well studied by most existing generative models. To address this problem, we propose a skeleton-based model to promote the coherence of generated stories. Different from traditional models that generate a complete sentence at a stroke, the proposed model first generates the most critical phrases, called skeleton, and then expands the skeleton to a complete and fluent sentence. The skeleton is not manually defined, but learned by a reinforcement learning method. Compared to the state-of-the-art models, our skeleton-based model can generate significantly more coherent text according to human evaluation and automatic evaluation. The G-score is improved by 20.1% in human evaluation.

pdf bib
Unpaired Sentiment-to-Sentiment Translation: A Cycled Reinforcement Learning Approach
Jingjing Xu | Xu Sun | Qi Zeng | Xiaodong Zhang | Xuancheng Ren | Houfeng Wang | Wenjie Li
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The goal of sentiment-to-sentiment “translation” is to change the underlying sentiment of a sentence while keeping its content. The main challenge is the lack of parallel data. To solve this problem, we propose a cycled reinforcement learning method that enables training on unpaired data by collaboration between a neutralization module and an emotionalization module. We evaluate our approach on two review datasets, Yelp and Amazon. Experimental results show that our approach significantly outperforms the state-of-the-art systems. Especially, the proposed method substantially improves the content preservation performance. The BLEU score is improved from 1.64 to 22.46 and from 0.56 to 14.06 on the two datasets, respectively.


pdf bib
Improving Semantic Relevance for Sequence-to-Sequence Learning of Chinese Social Media Text Summarization
Shuming Ma | Xu Sun | Jingjing Xu | Houfeng Wang | Wenjie Li | Qi Su
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Current Chinese social media text summarization models are based on an encoder-decoder framework. Although its generated summaries are similar to source texts literally, they have low semantic relevance. In this work, our goal is to improve semantic relevance between source texts and summaries for Chinese social media summarization. We introduce a Semantic Relevance Based neural model to encourage high semantic similarity between texts and summaries. In our model, the source text is represented by a gated attention encoder, while the summary representation is produced by a decoder. Besides, the similarity score between the representations is maximized during training. Our experiments show that the proposed model outperforms baseline systems on a social media corpus.


pdf bib
Dependency-based Gated Recursive Neural Network for Chinese Word Segmentation
Jingjing Xu | Xu Sun
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)