Jinan Xu (徐金安) - ACL Anthology

Jinan Xu

Also published as: JinAn Xu, Xu Jinan, 金安徐

2026

Group-Merger: A LoRA-based Framework for Multilingual Continual Learning
Weijian yi | Hongliang Li | Jinan Xu
Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026)

Multilingual continual learning (MCL) is crucial for enabling language models to adapt across diverse linguistic environments while retaining knowledge over time. Existing parameter isolation methods allocate language-specific modules but fail to leverage cross-lingual transfer, leading to inefficient parameter growth and poor generalization. Model merging based approaches suffer from severe performance degradation as the number of language-specific tasks increases, due to interference between linguistic and task-specific knowledge. To address these challenges, we propose Group-Merger, a framework that employs group-wise merging to balance parameter efficiency and continual learning performance. Our framework mitigates catastrophic forgetting across languages while enabling knowledge transfer. Extensive experiments on multilingual evaluation benchmarks demonstrate superior performance compared to existing methods.

Evidence-Augmented Generation Reasoning for Extremely Low-Resource Language Decipherment
Xiaoyu Zhu | Long Yuan | Rui Qi | Jinan Xu
Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026)

Inspired by linguistic Olympiads, extremely low-resource language reasoning presents a unique challenge that enables models to solve problems without prior knowledge. This task mirrors the Rosetta Stone decipherment process, where the goal is to induce and apply linguistic rules from minimal context. Existing methods mainly rely on naive in-context learning that fails to handle the complexity and diversity of language rules. To mitigate this issue, we propose a framework that combines dynamic knowledge construction with task-aware evidence augmentation. First, we use large language models (LLMs) to generate a diverse set of task-specific examples that instantiate potential linguistic rules for the target low-resource language. Second, we apply a semantic retrieval mechanism to select the most relevant examples as evidence for each test query, preventing context overload and ensuring focused, analogical reasoning. Our method shifts from learning language distributions to dynamically discovering and applying rules. Experimental results on the LINGOLY and Linguini benchmark show that our approach achieves competitive performance across various LLMs, outperforming existing baselines. More importantly, our framework advances extremely low-resource reasoning and provides a generalizable framework for rule induction under knowledge constraints.

Can Multi-agent Help Disambiguation in Multi-domain Translation?
Zhibo Man | Shaoyang Xu | Yujie Zhang | Yi Feng | Yuanmeng Chen | Yufeng Chen | Xu Jinan | Wenxuan Zhang
Findings of the Association for Computational Linguistics: ACL 2026

Large language models (LLMs)-based multi-agent systems have recently shown strong potential for machine translation (MT). However, their application to multi-domain translation (MDT) remains under-explored, particularly in addressing cross-domain word ambiguity. To investigate whether multi-agent approaches can help disambiguation in MDT, we propose a multi-agent collaborative disambiguation framework for MDT (MACD), which leverages the collaborative capabilities of LLMs for disambiguation. MACD consists of four cooperating agents responsible for domain allocation, general translation, domain disambiguation, and translation fusion. Experimental results show that MACD significantly improves translation performance across multiple domains and enhances disambiguation accuracy. Our approach reveals several findings on multi-agent collaboration in resolving word ambiguities.

Language-Coupled Reinforcement Learning for Multilingual Retrieval-Augmented Generation
Rui Qi | Fengran Mo | Yufeng Chen | Xue Zhang | Shuo Wang | Hongliang Li | Xu Jinan | Meng Jiang | Jian-Yun Nie | Kaiyu Huang
Findings of the Association for Computational Linguistics: ACL 2026

Multilingual retrieval-augmented generation (MRAG) requires models to effectively acquire and integrate beneficial external knowledge from multilingual collections. However, most existing studies employ a unitive process where queries of equivalent semantics across different languages are processed through a single-turn retrieval and subsequent optimization. Such a “one-size-fits-all” strategy is often suboptimal in multilingual settings, as the models occur to knowledge bias and conflict during the interaction with the search engine. To alleviate the issues, we propose LcRL, a multilingual search-augmented reinforcement learning framework that integrates a language-coupled Group Relative Policy Optimization into the policy and reward models. We adopt the language-coupled group sampling in the rollout module to reduce knowledge bias, and regularize an auxiliary anti-consistency penalty in the reward models to mitigate the knowledge conflict. Experimental results demonstrate that not only achieves competitive performance but is also appropriate for various practical scenarios such as constrained training data and retrieval over collections encompassing a large number of languages. Our code is available at https://anonymous.4open.science/r/LcRL-B4EF.

When Helpers Become Hazards: A Benchmark for Analyzing Multimodal LLM-Powered Safety in Daily Life
Xinyue Lou | Xu Jinan | Jingyi Yin | Xiaolong Wang | Zhaolu Kang | Liaoyouwei | Yixuan Wang | Xiangyu Shi | Fengran Mo | SU Yao | Kaiyu Huang
Findings of the Association for Computational Linguistics: ACL 2026

As Multimodal Large Language Models (MLLMs) become an indispensable assistant in human life, the unsafe content generated by MLLMs poses a danger to human behavior, perpetually overhanging human society like a sword of Damocles. To investigate and evaluate the safety impact of MLLMs responses on human behavior in daily life, we introduce SaLAD, a multimodal satety benchmark which contains 2,013 real-world image–text samples across 10 common categories, with a balanced design covering both unsafe scenarios and cases of oversensitivity. It emphasizes realistic risk exposure, authentic visual inputs, and fine-grained cross-modal reasoning, ensuring that safety risks cannot be inferred from text alone. We further propose a safety-warning-based evaluation framework that encourages models to provide clear and informative safety warnings, rather than generic refusals. Results on 18 MLLMs demonstrate that the top-performing models achieve a safe response rate of only 57.2% on unsafe queries. Morevoer, even popular safety alignment methods limit effectiveness of the models in our scenario, revealing the vulnerabilities of current MLLMs in identifying dangerous behaviors in daily life. Our dataset is available at https://github.com/xinyuelou/SaLAD.

Think Natively: Unlocking Multilingual Reasoning with Consistency-Enhanced Reinforcement Learning
Xue Zhang | Yunlong Liang | Fandong Meng | Songming Zhang | Kaiyu Huang | Yufeng Chen | Xu Jinan | Jie Zhou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Reasoning Models (LRMs) have achieved remarkable performance on complex reasoning tasks by adopting the “think-then-answer” paradigm, which enhances both accuracy and interpretability. However, current LRMs exhibit two critical limitations when processing non-English languages: (1) They often struggle to maintain input-output language consistency; (2) They generally perform poorly with wrong reasoning paths and lower answer accuracy compared to English. These limitations significantly compromise the interpretability of reasoning processes and degrade the user experience for non-English speakers, hindering the global deployment of LRMs. To address these limitations, we propose M-Thinker, which is trained by the GRPO algorithm that involves a Language Consistency (LC) reward and a novel Cross-lingual Thinking Alignment (CTA) reward. Specifically, the LC reward defines a strict constraint on the language consistency between the input, thought, and answer. Besides, the CTA reward compares the model’s non-English reasoning paths with its English reasoning path to transfer its own reasoning capability from English to non-English languages. Through an iterative RL procedure, our M-Thinker-1.5B/4B/7B models not only achieve nearly 100% language consistency and superior performance on two multilingual benchmarks (MMATH and PolyMath), but also exhibit excellent generalization on out-of-domain languages.

2025

Multi-Stage LLM Fine-Tuning with a Continual Learning Setting
Changhao Guan | Chao Huang | Hongliang Li | You Li | Ning Cheng | Zihe Liu | Jinan Xu | Yufeng Chen | Jian Liu
Findings of the Association for Computational Linguistics: NAACL 2025

In recent years, large language models (LLMs) have made significant progress in knowledge-intensive applications. However, when adapting them to specific domains, we may encounter a multi-stage continuous learning scenario, especially in cases where domain knowledge evolves rapidly.This issue severely limits traditional fine-tuning approaches for LLMs.To overcome this limitation, we propose a new learning paradigm designed specifically for multi-stage continuous learning. This paradigm includes a preference-based learning bias to identify potential knowledge conflicts, as well as a self-distillation-based data augmentation strategy to expand and enrich the training corpus, thereby improving the integration of knowledge-compatible information.In the experiments, we show that our proposed method achieves a significant improvement in accuracy after 7 stages of fine-tuning compared to previous methods, while also demonstrating excellent performance in preserving general knowledge.We have released our code and dataset at Multi-Stage-Learning.

SoT: Structured-of-Thought Prompting Guides Multilingual Reasoning in Large Language Models
Rui Qi | Zhibo Man | Yufeng Chen | Fengran Mo | Jinan Xu | Kaiyu Huang
Findings of the Association for Computational Linguistics: EMNLP 2025

Recent developments have enabled Large Language Models (LLMs) to engage in complex reasoning tasks through deep thinking. However, the capacity of reasoning has not been successfully transferred to non-high-resource languages due to resource constraints, which struggles with multilingual reasoning tasks. To this end, we propose Structured-of-Thought (SoT), a training-free method that improves the performance on multilingual reasoning through a multi-step transformation: Language Thinking Transformation and Structured Knowledge Transformation. The SoT method converts language-specific semantic information into language-agnostic structured representations, enabling the models to understand the query in different languages more sophisticated. Besides, SoT effectively guides LLMs toward more concentrated reasoning to maintain consistent underlying reasoning pathways when handling cross-lingual variations in expression. Experimental results demonstrate that SoT outperforms several strong baselines on multiple multilingual reasoning benchmarks when adapting to various backbones of LLMs. It can also be integrated with other training-free strategies for further improvements. Our code is available at https://github.com/Cherry-qwq/SoT.

Multilingual Collaborative Defense for Large Language Models
Hongliang Li | Jinan Xu | Gengping Cui | Changhao Guan | Fengran Mo | Kaiyu Huang
Findings of the Association for Computational Linguistics: EMNLP 2025

The robustness and security of Large Language Models (LLMs) face increasing threats, especially in multilingual settings. A notable vulnerability is “jailbreaking” via translating harmful queries into rare or underrepresented languages, which often bypasses existing safeguards. In this work, we propose Multilingual Collaborative Defense (MCD), a novel learning method that optimizes a continuous soft safety prompt automatically to facilitate multilingual safeguarding of LLMs. MCD organically leverages collaborative signals from multiple languages by rotating each as the training “center,” allowing auxiliary languages to reinforce safety prompt learning and ensuring cross‐lingual consistency. As a result, MCD improves defense performance across all languages, reduces false refusals, and mitigates safety misalignment caused by corpus imbalance. To evaluate MCD, we construct multilingual versions of jailbreak benchmarks such as MaliciousInstruct and AdvBench, including zero-shot languages, to assess language transferability. Experiments show that MCD outperforms prior approaches in multilingual jailbreak defense while exhibiting strong cross-lingual generalization. Our code is available at https://github.com/HLiang-Lee/MCD.

CM-Align: Consistency-based Multilingual Alignment for Large Language Models
Xue Zhang | Yunlong Liang | Fandong Meng | Songming Zhang | Yufeng Chen | Jinan Xu | Jie Zhou
Findings of the Association for Computational Linguistics: EMNLP 2025

Current large language models (LLMs) generally show a significant performance gap in alignment between English and other languages.To bridge this gap, existing research typically leverages the model’s responses in English as a reference to select the best/worst responses in other languages, which are then used for Direct Preference Optimization (DPO) training.However, we argue that there are two limitations in the current methods that result in noisy multilingual preference data and further limited alignment performance: 1) Not all English responses are of high quality, and using a response with low quality may mislead the alignment for other languages. 2) Current methods usually use biased or heuristic approaches to construct multilingual preference pairs.To address these limitations, we design a consistency-based data selection method to construct high-quality multilingual preference data for improving multilingual alignment (CM-Align).Specifically, our method includes two parts: consistency-guided English reference selection and cross-lingual consistency-based multilingual preference data construction.Experimental results on three LLMs and three common tasks demonstrate the effectiveness and superiority of our method, which further indicates the necessity of constructing high-quality preference data.

A Law Reasoning Benchmark for LLM with Tree-Organized Structures including Factum Probandum, Evidence and Experiences
Jiaxin Shen | Jinan Xu | Huiqi Hu | Luyi Lin | Guoyang Ma | Fei Zheng | Fandong Meng | Jie Zhou | Wenjuan Han
Findings of the Association for Computational Linguistics: ACL 2025

While progress has been made in legal applications, law reasoning, crucial for fair adjudication, remains unexplored. We propose a transparent law reasoning schema enriched with hierarchical factum probandum, evidence, and implicit experience, enabling public scrutiny and preventing bias. Inspired by this schema, we introduce the challenging task, which takes a textual case description and outputs a hierarchical structure justifying the final decision. We also create the first crowd-sourced dataset for this task, enabling comprehensive evaluation. Simultaneously, we propose TL agent that employs a comprehensive suite of legal analysis tools to address the challenge task. This benchmark paves the way for transparent and accountable AI-assisted law-reasoning in the “Intelligent Court”.

Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
You Li | Heyu Huang | Chi Chen | Kaiyu Huang | Chao Huang | Zonghao Guo | Zhiyuan Liu | Jinan Xu | Yuhua Li | Ruixuan Li | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2025

The recent advancement of Multimodal Large Language Models (MLLMs) has significantly improved their fine-grained perception of single images and general comprehension across multiple images. However, existing MLLMs still face challenges in achieving precise grounding in complex multi-image scenarios. To address this, we first explore a Chain-of-Thought (CoT) framework that integrates single-image grounding with multi-image comprehension. While partially effective, it remains unstable and struggles to capture abstract visual information due to its non-end-to-end nature. Therefore, we introduce Migician, the first multi-image grounding model capable of performing free-form and accurate grounding across multiple images. To support this, we present the MGrounding-630k dataset, which comprises data for several multi-image grounding tasks derived from existing datasets, along with newly generated free-form grounding instruction-following data. Furthermore, we propose MIG-Bench, a comprehensive benchmark specifically designed for evaluating multi-image grounding capabilities. Experimental results demonstrate that our model achieves significantly superior multi-image grounding capabilities, outperforming the best existing MLLMs by 24.94% and even surpassing much larger 70B models. Our code, model, dataset, and benchmark are fully open-sourced at https://migician-vg.github.io/.

Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical Mapping
Yijie Chen | Yijin Liu | Fandong Meng | Yufeng Chen | Jinan Xu | Jie Zhou
Findings of the Association for Computational Linguistics: ACL 2025

Knowledge Distillation (KD) has emerged as a prominent technique for model compression. However, conventional KD approaches primarily focus on homogeneous architectures with identical tokenizers, constraining their applicability in cross-architecture scenarios. As for the cross-tokenizer KD, the differences in the tokenizers give rise to two fundamental challenges: (1) sequence misalignment caused by divergent tokenization strategies, and (2) mismatched vocabulary size and composition. While existing probability-matching methods attempt to address these issues, their efficacy remains limited due to suboptimal alignment in both the sequence and vocabulary aspects. To overcome these limitations, we propose Contextual Dynamic Mapping (CDM), a novel cross-tokenizer distillation framework that employs contextual information to enhance sequence alignment precision and dynamically improves vocabulary mapping. We evaluated the effectiveness of our approach across five advanced and widely-used model families (i.e,LLama3, Phi3, Gemma2, OPT and Qwen2), which were configured into three distinct teacher-student pairs. Our method shows significant advantages over existing cross-tokenizer distillation baselines across diverse benchmarks, including instruction-following, code generation and math. Notably, our analysis reveals that combining conventional same-tokenizer distillation and cross-tokenizer distillation through CDM yields further performance improvements.

Boosting Data Utilization for Multilingual Dense Retrieval
Chao Huang | Fengran Mo | Yufeng Chen | Changhao Guan | Zhenrui Yue | Xinyu Wang | Jinan Xu | Kaiyu Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Multilingual dense retrieval aims to retrieve relevant documents across different languages based on a unified retriever model. The challenge lies in aligning representations of different languages in a shared vector space. The common practice is to fine-tune the dense retriever via contrastive learning, whose effectiveness highly relies on the quality of the negative sample and the efficacy of mini-batch data. Different from the existing studies that focus on developing sophisticated model architecture, we propose a method to boost data utilization for multilingual dense retrieval by obtaining high-quality hard negative samples and effective mini-batch data. The extensive experimental results on a multilingual retrieval benchmark, MIRACL, with 16 languages demonstrate the effectiveness of our method by outperforming several existing strong baselines.

DMDTEval: An Evaluation and Analysis of LLMs on Disambiguation in Multi-domain Translation
Zhibo Man | Yuanmeng Chen | Yujie Zhang | Jinan Xu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Currently, Large Language Models (LLMs) have achieved remarkable results in machine translation. However, their performance in multi-domain translation (MDT) is less satisfactory, the meanings of words can vary across different domains, highlighting the significant ambiguity inherent in MDT. Therefore, evaluating the disambiguation ability of LLMs in MDT, remains an open problem. To this end, we present an evaluation and analysis of LLMs on disambiguation in multi-domain translation (DMDTEval), our systematic evaluation framework consisting of three aspects: (1) we construct a translation test set with multi-domain ambiguous word annotation, (2) we curate a diverse set of disambiguation prompt strategies, and (3) we design precise disambiguation metrics, and study the efficacy of various prompt strategies on multiple state-of-the-art LLMs. We conduct comprehensive experiments across 4 language pairs and 13 domains, our extensive experiments reveal a number of crucial findings that we believe will pave the way and also facilitate further research in the critical area of improving the disambiguation of LLMs.

Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model
Xinyue Lou | You Li | Jinan Xu | Xiangyu Shi | Chi Chen | Kaiyu Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

The rapid development of Multimodal Large Reasoning Models (MLRMs) has demonstrated broad application potential, yet their safety and reliability remain critical concerns that require systematic exploration. To address this gap, we conduct a comprehensive and systematic safety evaluation of 13 MLRMs across 5 benchmarks and unveil prevalent safety degradation phenomena in most advanced models. Moreover, our analysis reveals distinct safety patterns across different benchmarks: significant safety degradation is observed across jailbreak robustness benchmarks, whereas safety-awareness benchmarks demonstrate less pronounced degradation. In particular, the long thought process in some scenarios even enhances safety performance. Therefore, it is a potential approach to address safety issues in MLRMs by leveraging the intrinsic reasoning capabilities of the model to detect unsafe intent. To operationalize this insight, we construct a multimodal tuning dataset that incorporates a safety-oriented thought process. Experimental results from fine-tuning existing MLRMs with this dataset effectively enhance the safety on both jailbreak robustness and safety-awareness benchmarks. This study provides a new perspective for developing safe MLRMs.

Multilingual Knowledge Editing with Language-Agnostic Factual Neurons
Xue Zhang | Yunlong Liang | Fandong Meng | Songming Zhang | Yufeng Chen | Jinan Xu | Jie Zhou
Proceedings of the 31st International Conference on Computational Linguistics

Multilingual knowledge editing (MKE) aims to simultaneously update factual knowledge across multiple languages within large language models (LLMs). Previous research indicates that the same knowledge across different languages within LLMs exhibits a degree of shareability. However, most existing MKE methods overlook the connections of the same knowledge between different languages, resulting in knowledge conflicts and limited edit performance. To address this issue, we first investigate how LLMs process multilingual factual knowledge and discover that the same factual knowledge in different languages generally activates a shared set of neurons, which we call language-agnostic factual neurons (LAFNs). These neurons represent the same factual knowledge shared across languages and imply the semantic connections among multilingual knowledge. Inspired by this finding, we propose a new MKE method by Locating and Updating Language-Agnostic Factual Neurons (LU-LAFNs) to edit multilingual knowledge simultaneously, which avoids knowledge conflicts and thus improves edit performance. Experimental results on Bi-ZsRE and MzsRE benchmarks demonstrate that our method achieves the best edit performance, indicating the effectiveness and importance of modeling the semantic connections among multilingual knowledge.

多领域翻译中语义消歧的话题方向盘方法
Zhibo Man | Yujie Zhang | Yuanmeng Chen | Yufeng Chen | Jinan Xu
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

"近年,大语言模型(Large Language Models, LLMs)在通用文本翻译任务上的翻译质量取得大幅的提升,但是在面对多领域文本时,翻译质量呈现明显下降。如何利用有限的领域双语平行语料增强领域翻译知识成为主要的研究目标,已有方法大多使用人为设置的领域标签学习语义表示,导致其在消歧知识的获取上受到限制,如何构建有效的消歧知识成为一种挑战。为此,本文提出一种多领域翻译中语义消歧的话题方向盘方法,旨在增强大语言模型在多领域上的语义消歧能力,具体包括:(1)基于话题模型的语义表示获取机制:我们首先利用ETM自动聚类算法获取细小颗粒度的话题语义表示用于之后构建消歧知识,这种话题的表示更贴近语义,也更适合作为语义单元来构建语义表示。然后,我们设计TopicModel函数将大模型的表示转换成话题的语义表示。(2)基于话题方向盘的领域消歧知识获取机制:我们设计可学习的变换矩阵,通过建模不同领域下话题分布的投影方向获取多领域上的语义消歧知识。话题的语义表示经过领域方向投影的再次变换后,有效的语义消歧特征得到强化,从而提升大语言模型在不同领域下的语义消歧能力。我们选取Qwen-2.5-1.5B作为基础模型,在英语-汉语以及德语-汉语两个多领域翻译任务上进行实验验证。实验结果表明,该方法在平均BLEU值和COMET均超出基线模型,进一步我们对于翻译质量的提升与消歧效果之间的关系进行了分析,并通过翻译实例给出详细说明。"

AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation
Songming Zhang | Xue Zhang | Tong Zhang | Bojie Hu | Yufeng Chen | Jinan Xu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In modern large language models (LLMs), LLM alignment is of crucial importance and is typically achieved through methods such as reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO). However, in most existing methods for LLM alignment, all tokens in the response are optimized using a sparse, response-level reward or preference annotation. The ignorance of token-level rewards may erroneously punish high-quality tokens or encourage low-quality tokens, resulting in suboptimal performance and slow convergence speed. To address this issue, we propose AlignDistil, a RLHF-equivalent distillation method for token-level reward optimization. Specifically, we introduce the reward learned by DPO into the RLHF objective and theoretically prove the equivalence between this objective and a token-level distillation process, where the teacher distribution linearly combines the logits from the DPO model and a reference model. On this basis, we further bridge the accuracy gap between the reward from the DPO model and the pure reward model, by building a contrastive DPO reward with a normal and a reverse DPO model. Moreover, to avoid under- and over-optimization on different tokens, we design a token adaptive logit extrapolation mechanism to construct an appropriate teacher distribution for each token. Experimental results demonstrate the superiority of our AlignDistil over existing methods and showcase fast convergence due to its token-level distributional reward optimization.

Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts
Xue Zhang | Yunlong Liang | Fandong Meng | Songming Zhang | Yufeng Chen | Jinan Xu | Jie Zhou
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Continually expanding new languages for existing large language models (LLMs) is a promising yet challenging approach to building powerful multilingual LLMs.The biggest challenge is to make the model continuously learn new languages while preserving the proficient ability of old languages.To achieve this, recent work utilizes the Mixture-of-Experts (MoE) architecture to expand new languages by adding new experts and avoid catastrophic forgetting of old languages by routing corresponding tokens to the original model backbone (old experts).Although intuitive, this kind of method is parameter-costly when expanding new languages and still inevitably impacts the performance of old languages.To address these limitations, we analyze the language characteristics of different layers in LLMs and propose a layer-wise expert allocation algorithm (LayerMoE) to determine the appropriate number of new experts for each layer.Specifically, we find different layers in LLMs exhibit different representation similarities between languages and then utilize the similarity as the indicator to allocate experts for each layer, i.e., the higher similarity, the fewer experts.Additionally, to further mitigate the forgetting of old languages, we add a classifier in front of the router network on the layers with higher similarity to guide the routing of old language tokens.Experimental results show that our method outperforms the previous state-of-the-art baseline with 60% fewer experts in the single-expansion setting and with 33.3% fewer experts in the lifelong-expansion setting, demonstrating the effectiveness of our method.

2024

SKICSE: Sentence Knowable Information Prompted by LLMs Improves Contrastive Sentence Embeddings
Fangwei Ou | Jinan Xu
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

Contrastive learning, which utilizes positive pairs and in-batch negatives to optimize the loss objective, has been proven to be an effective method for learning sentence embeddings. However, we argue that the previous methods of constructing positive pairs only through dropout perturbation or entailment relation are limited. Since there is more sentence knowable information (SKI) to be mined, such as sentence external knowledge, semantic analysis, and grammatical description. In this work, we first hand-craft a simple and effective prompt template that is able to obtain the knowable information of input sentences from LLMs (e.g., LLaMA). Then we combine the original sentence and its knowable information to form a positive pair for contrastive learning. We evaluate our method on standard semantic textual similarity (STS) tasks. Experimental results show that our unsupervised and supervised models using BERTbase achieve an average of 78.65% and 82.45% Spearman’s correlation respectively, a 2.40% and 0.88% improvement compared to SimCSE. Our model outperforms the previous state-of-the-art model PromptBERT in both unsupervised and supervised settings and specifically yields a new state-of-the-art performance in supervised setting.

CollabKG: A Learnable Human-Machine-Cooperative Information Extraction Toolkit for (Event) Knowledge Graph Construction
Xiang Wei | Yufeng Chen | Ning Cheng | Xingyu Cui | Jinan Xu | Wenjuan Han
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In order to construct or extend entity-centric and event-centric knowledge graphs (KG and EKG), the information extraction (IE) annotation toolkit is essential. However, existing IE toolkits have several non-trivial problems, such as not supporting multi-tasks, and not supporting automatic updates. In this work, we present CollabKG, a learnable human-machine-cooperative IE toolkit for KG and EKG construction. Specifically, for the multi-task issue, CollabKG unifies different IE subtasks, including named entity recognition (NER), entity-relation triple extraction (RE), and event extraction (EE), and supports both KG and EKG. Then, combining advanced prompting-based IE technology, the human-machine-cooperation mechanism with Large Language Models (LLMs) as the assistant machine is presented which can provide a lower cost as well as a higher performance. Lastly, owing to the two-way interaction between the human and machine, CollabKG with learning ability allows self-renewal. Besides, CollabKG has several appealing features (e.g., customization, training-free, and label propagation) that make the system powerful and high-productivity. We holistically compare our toolkit with other existing tools on these features. Human evaluation quantitatively illustrates that CollabKG significantly improves annotation quality, efficiency, and stability simultaneously.

A Reinforcement Learning Approach to Improve Low-Resource Machine Translation Leveraging Domain Monolingual Data
Hongxiao Zhang | Mingtong Liu | Chunyou Li | Yufeng Chen | Jinan Xu | Ming Zhou
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Due to the lack of parallel data, the mainstream fine-tuning-based domain adaptation methods have the overfitting problem in the translation of low-resource domains, and it is difficult for the model to learn the in-domain generalization knowledge. To address the above issue, in this work, we propose a novel Reinforcement Learning Domain Adaptation method for Neural Machine Translation (RLDA-NMT) in the low-resource domain. RLDA-NMT utilizes in-domain source monolingual data to make up for the lack of parallel data, and reinforces domain features learning to make the translation model learn the domain-specific knowledge more fully. Specifically, we first train a ranking-based model with a small-scale in-domain parallel corpus, and then adopt it as the reward model to select higher-quality generated translations for reinforcement when fine-tuning pre-trained NMT model using in-domain source monolingual data. We conduct experiments on Education, Laws, Thesis, and Patent domains of Chinese⇔English translation tasks. Experimental results demonstrate that RLDA-NMT can alleviate overfitting and reinforce the NMT model to learn domain-specific knowledge. Additionally, the results also show that RLDA-NMT and back-translation (BT) are nicely complementary to each other, where combining RLDA-NMT with BT can further improve translation quality.

ICL: Iterative Continual Learning for Multi-domain Neural Machine Translation
Zhibo Man | Kaiyu Huang | Yujie Zhang | Yuanmeng Chen | Yufeng Chen | Jinan Xu
Findings of the Association for Computational Linguistics: EMNLP 2024

In a practical scenario, multi-domain neural machine translation (MDNMT) aims to continuously acquire knowledge from new domain data while retaining old knowledge. Previous work separately learns each new domain knowledge based on parameter isolation methods, which effectively capture the new knowledge. However, task-specific parameters lead to isolation between models, which hinders the mutual transfer of knowledge between new domains. Given the scarcity of domain-specific corpora, we consider making full use of the data from multiple new domains. Therefore, our work aims to leverage previously acquired domain knowledge when modeling subsequent domains. To this end, we propose an Iterative Continual Learning (ICL) framework for multi-domain neural machine translation. Specifically, when each new domain arrives, (1) we first build a pluggable incremental learning model, (2) then we design an iterative updating algorithm to continuously update the original model, which can be used flexibly for constructing subsequent domain models. Furthermore, we design a domain knowledge transfer mechanism to enhance the fine-grained domain-specific representation, thereby solving the word ambiguity caused by mixing domain data. Experimental results on the UM-Corpus and OPUS multi-domain datasets show the superior performance of our proposed model compared to representative baselines.

Towards Multi-Relational Multi-Hop Reasoning over Dense Temporal Knowledge Graphs
Jian Liu | Zihe Liu | Xueqiang Lyu | Peng Jin | Jinan Xu
Findings of the Association for Computational Linguistics: ACL 2024

Temporal knowledge graph reasoning has emerged as a crucial task for answering time-dependent questions within a knowledge graph (KG).Despite tremendous progress, the present research is impeded by the sparsity of a temporal KG and an over-reliance on simple single-relational reasoning patterns. To overcome these challenges, we introduce MulQuestions, a new temporal KG reasoning benchmark featuring over 200k entities and 960k questions designed to facilitate complex, multi-relational and multi-hop reasoning. Additionally, we propose a new model adept at conducting pattern-aware and time-sensitive reasoning across temporal KGs. The model’s efficacy is confirmed through rigorous evaluations, showcasing its effectiveness in sparse data conditions and adeptness at handling questions with long reasoning chains. We have made our benchmark and model publicly accessible at [https://anonymous].

Outdated Issue Aware Decoding for Factual Knowledge Editing
Zengkui Sun | Yijin Liu | Jiaan Wang | Fandong Meng | Jinan Xu | Yufeng Chen | Jie Zhou
Findings of the Association for Computational Linguistics: ACL 2024

Recently, Knowledge Editing has received increasing attention, since it could update the specific knowledge from outdated ones in pretrained models without re-training. However, as pointed out by recent studies, existing related methods tend to merely memorize the superficial word composition of the edited knowledge, rather than truly learning and absorbing it. Consequently, on the reasoning questions, we discover that existing methods struggle to utilize the edited knowledge to reason the new answer, and tend to retain outdated responses, which are generated by the original models utilizing original knowledge. Nevertheless, the outdated responses are unexpected for the correct answers to reasoning questions, which we named as the outdated issue. To alleviate this issue, in this paper, we propose a simple yet effective decoding strategy, i.e., outDated ISsue aware deCOding (DISCO), to enhance the performance of edited models on reasoning questions. Specifically, we capture the difference in the probability distribution between the original and edited models. Further, we amplify the difference of the token prediction in the edited model to alleviate the outdated issue, and thus enhance the model performance w.r.t the edited knowledge. Experimental results suggest that applying DISCO could enhance edited models to reason, e.g., on reasoning questions, DISCO outperforms the prior SOTA method by 12.99 F1 scores, and reduces the ratio of the outdated issue to 5.78% on the zsRE dataset.

LCS: A Language Converter Strategy for Zero-Shot Neural Machine Translation
Zengkui Sun | Yijin Liu | Fandong Meng | Jinan Xu | Yufeng Chen | Jie Zhou
Findings of the Association for Computational Linguistics: ACL 2024

Multilingual neural machine translation models generally distinguish translation directions by the language tag (LT) in front of the source or target sentences. However, current LT strategies cannot indicate the desired target language as expected on zero-shot translation, i.e., the off-target issue. Our analysis reveals that the indication of the target language is sensitive to the placement of the target LT. For example, when placing the target LT on the decoder side, the indication would rapidly degrade along with decoding steps, while placing the target LT on the encoder side would lead to copying or paraphrasing the source input. To address the above issues, we propose a simple yet effective strategy named Language Converter Strategy (LCS). By introducing the target language embedding into the top encoder layers, LCS mitigates confusion in the encoder and ensures stable language indication for the decoder. Experimental results on MultiUN, TED, and OPUS-100 datasets demonstrate that LCS could significantly mitigate the off-target issue, with language accuracy up to 95.28%, 96.21%, and 85.35% meanwhile outperforming the vanilla LT strategy by 3.07, 3,3, and 7.93 BLEU scores on zero-shot translation, respectively.

Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective
Yijie Chen | Yijin Liu | Fandong Meng | Yufeng Chen | Jinan Xu | Jie Zhou
Findings of the Association for Computational Linguistics: ACL 2024

Code generation aims to understand the problem description and generate corresponding code snippets, where existing works generally decompose such complex tasks into intermediate steps by prompting strategies, such as Chain-of-Thought and its variants. While these studies have achieved some success, their effectiveness is highly dependent on the capabilities of advanced Large Language Models (LLMs) such as GPT-4, particularly in terms of API calls, which significantly limits their practical applicability. Consequently, how to enhance the code generation capabilities of small and medium-scale code LLMs without significantly increasing training costs is an appealing challenge. In this paper, we suggest that code comments are the natural logic pivot between natural language and code language and propose using comments to boost the code generation ability of code LLMs. Concretely, we propose MANGO (comMents As Natural loGic pivOts), including a comment contrastive training strategy and a corresponding logical comment decoding strategy. Experiments are performed on HumanEval and MBPP, utilizing StarCoder and WizardCoder as backbone models, and encompassing model parameter sizes between 3B and 7B. The results indicate that MANGO significantly improves the code pass rate based on the strong baselines. Meanwhile, the robustness of the logical comment decoding strategy is notably higher than the Chain-of-thoughts prompting.

Dual-Space Knowledge Distillation for Large Language Models
Songming Zhang | Xue Zhang | Zengkui Sun | Yufeng Chen | Jinan Xu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Knowledge distillation (KD) is known as a promising solution to compress large language models (LLMs) via transferring their knowledge to smaller models. During this process, white-box KD methods usually minimize the distance between the output distributions of the two models so that more knowledge can be transferred. However, in the current white-box KD framework, the output distributions are from the respective output spaces of the two models, using their own prediction heads. We argue that the space discrepancy will lead to low similarity between the teacher model and the student model on both representation and distribution levels. Furthermore, this discrepancy also hinders the KD process between models with different vocabularies, which is common for current LLMs. To address these issues, we propose a dual-space knowledge distillation (DSKD) framework that unifies the output spaces of the two models for KD. On the basis of DSKD, we further develop a cross-model attention mechanism, which can automatically align the representations of the two models with different vocabularies. Thus, our framework is not only compatible with various distance functions for KD (e.g., KL divergence) like the current framework, but also supports KD between any two LLMs regardless of their vocabularies. Experiments on task-agnostic instruction-following benchmarks show that DSKD significantly outperforms the current white-box KD framework with various distance functions, and also surpasses existing KD methods for LLMs with different vocabularies.

结合LLM与3D动画技术的手语数字人系统
Yang Yang (杨阳) | Ying Zhang (张颖) | Kaiyu Huang (黄锴宇) | Jinan Xu (徐金安)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“手语翻译(Sign Language Translation, SLT)系统作为一种重要的辅助技术,为听障人士提供了与他人沟通的有效途径。然而,传统手语翻译系统在准确性、流畅性差等方面存在问题。本文提出了一种结合大语言模型(Large Language Model, LLM)和3D动画技术(3D Animation Technology)的手语翻译系统,旨在克服这些局限,提高翻译的准确性和流畅性。本文详细介绍了系统的设计与实现过程,包括提示词设计、数据处理方法以及手语数字人翻译系统的实现。实验结果表明,采用LLM方法在手语翻译中能够生成较为自然和准确的结果。在标准评估和人工评估的两种评估方法下,本系统在大多数情况下能够较好地完成手语翻译任务,性能优于传统方法。本文的研究为进一步改进手语翻译系统提供了有益的参考和启示。”

融合确定性因子及区域密度的k-最近邻机器翻译方法(A k-Nearest-Neighbor Machine Translation Method Combining Certainty Factor and Region Density)
Rui Qi (齐睿) | Xiangyu Shi (石响宇) | Zhibo Man (满志博) | Jinan Xu (徐金安) | Yufeng Chen (陈钰枫)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“k-最近邻机器翻译(kNN-MT)是近年来神经机器翻译领域的一个重要研究方向。此类方法可以在不更新机器翻译模型的情况下提高翻译质量,但训练数据中高低频单词的数量不均衡限制了模型效果,且固定的k值无法对处于不同密度分布的数据都产生良好的翻译结果。为此本文提出了一种创新的kNN-MT方法,引入确定性因子(CF)来降低数据不均衡对模型效果的影响,并根据测试点周边数据密度动态选择k值。在多领域德-英翻译数据集上,相比基线实验,本方法在四个领域上翻译效果均有提升,其中三个领域上提升超过1个BLEU,有效提高了神经机器翻译模型的翻译质量。”

MITF:基于图像映射文本特征的跨模态图文检索方法(MITF:Cross-modal Image-text Retrieval Method with Mapping Images to Text Features)
Xinyue Lou (娄馨月) | You Li (李铀) | Rui Qi (齐睿) | Yufeng Chen (陈钰枫) | Jinan Xu (徐金安)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“减小图文信息间的语义鸿沟,促进跨模态信息的对齐与融合一直是解决跨模态图文检索问题的关键。但现有的双流模型因为训练时图像编码器与文本编码器是分开的,导致图文特征的对齐与融合较难。因此,本文提出图像映射文本特征(MITF)网络将不同模态(图像和文本)的信息映射到单一模态(文本),进一步增强跨模态语义的融合和对齐,提高图文检索的性能。具体地,在冻结预训练的中文视觉语言模型Chinese-CLIP参数的情况下,训练一个MITF网络将图像映射为伪语言标记,在此基础上引入提示词自动学习机制提升模型对于伪语言标记的理解能力。同时,在检索时构建Faiss索引提高检索速度。在三个开源数据集的实验结果表明所提方法相比原始Chinese-CLIP模型检索时的Mean Recall指标平均提高了3.7%,检索速度提高了约4倍。同时,图文特征可视化结果进一步表明所提方法提高了图像特征与文本特征的对齐程度。”

DoRA: Enhancing Parameter-Efficient Fine-Tuning with Dynamic Rank Distribution
Yulong Mao | Kaiyu Huang | Changhao Guan | Ganglin Bao | Fengran Mo | Jinan Xu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Fine-tuning large-scale pre-trained models is inherently a resource-intensive task. While it can enhance the capabilities of the model, it also incurs substantial computational costs, posing challenges to the practical application of downstream tasks. Existing parameter-efficient fine-tuning (PEFT) methods such as Low-Rank Adaptation (LoRA) rely on a bypass framework that ignores the differential parameter budget requirements across weight matrices, which may lead to suboptimal fine-tuning outcomes. To address this issue, we introduce the Dynamic Low-Rank Adaptation (DoRA) method. DoRA decomposes high-rank LoRA layers into structured single-rank components, allowing for dynamic pruning of parameter budget based on their importance to specific tasks during training, which makes the most of the limited parameter budget. Experimental results demonstrate that DoRA can achieve competitive performance compared with LoRA and full model fine-tuning, and outperform various strong baselines with the same storage parameter budget. Our code is available at [github](https://github.com/MIkumikumi0116/DoRA)

Continual Learning with Semi-supervised Contrastive Distillation for Incremental Neural Machine Translation
Yunlong Liang | Fandong Meng | Jiaan Wang | Jinan Xu | Yufeng Chen | Jie Zhou
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Incrementally expanding the capability of an existing translation model to solve new domain tasks over time is a fundamental and practical problem, which usually suffers from catastrophic forgetting. Generally, multi-domain learning can be seen as a good solution. However, there are two drawbacks: 1) it requires having the training data for all domains available at the same time, which may be unrealistic due to storage or privacy concerns; 2) it requires re-training the model on the data of all domains from scratch when adding a new domain and this is time-consuming and computationally expensive. To address these issues, we present a semi-supervised contrastive distillation framework for incremental neural machine translation. Specifically, to avoid catastrophic forgetting, we propose to exploit unlabeled data from the same distributions of the older domains through knowledge distillation. Further, to ensure the distinct domain characteristics in the model as the number of domains increases, we devise a cross-domain contrastive objective to enhance the distilled knowledge. Extensive experiments on domain translation benchmarks show that our approach, without accessing any previous training data or re-training on all domains from scratch, can significantly prevent the model from forgetting previously learned knowledge while obtaining good performance on the incrementally added domains. The code and data with step-by-step instructions will be released upon acceptance.

2023

Is ChatGPT a Good NLG Evaluator? A Preliminary Study
Jiaan Wang | Yunlong Liang | Fandong Meng | Zengkui Sun | Haoxiang Shi | Zhixu Li | Jinan Xu | Jianfeng Qu | Jie Zhou
Proceedings of the 4th New Frontiers in Summarization Workshop

Recently, the emergence of ChatGPT has attracted wide attention from the computational linguistics community. Many prior studies have shown that ChatGPT achieves remarkable performance on various NLP tasks in terms of automatic evaluation metrics. However, the ability of ChatGPT to serve as an evaluation metric is still underexplored. Considering assessing the quality of natural language generation (NLG) models is an arduous task and NLG metrics notoriously show their poor correlation with human judgments, we wonder whether ChatGPT is a good NLG evaluation metric. In this report, we provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric. In detail, we regard ChatGPT as a human evaluator and give task-specific (e.g., summarization) and aspect-specific (e.g., relevance) instruction to prompt ChatGPT to evaluate the generated results of NLG models. We conduct experiments on five NLG meta-evaluation datasets (including summarization, story generation and data-to-text tasks). Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with human judgments in most cases. In addition, we find that the effectiveness of the ChatGPT evaluator might be influenced by the creation method of the meta-evaluation datasets. For the meta-evaluation datasets which are created greatly depending on the reference and thus are biased, the ChatGPT evaluator might lose its effectiveness. We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.

Exploring Domain-shared and Domain-specific Knowledge in Multi-Domain Neural Machine Translation
Zhibo Man | Yujie Zhang | Yuanmeng Chen | Yufeng Chen | Jinan Xu
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track

Currently, multi-domain neural machine translation (NMT) has become a significant research topic in domain adaptation machine translation, which trains a single model by mixing data from multiple domains. Multi-domain NMT aims to improve the performance of the low-resources domain through data augmentation. However, mixed domain data brings more translation ambiguity. Previous work focused on domain-general or domain-context knowledge learning, respectively. Therefore, there is a challenge for acquiring domain-general or domain-context knowledge simultaneously. To this end, we propose a unified framework for learning simultaneously domain-general and domain-specific knowledge, we are the first to apply parameter differentiation in multi-domain NMT. Specifically, we design the differentiation criterion and differentiation granularity to obtain domain-specific parameters. Experimental results on multi-domain UM-corpus English-to-Chinese and OPUS German-to-English datasets show that the average BLEU scores of the proposed method exceed the strong baseline by 1.22 and 1.87, respectively. In addition, we investigate the case study to illustrate the effectiveness of the proposed method in acquiring domain knowledge.

D²TV: Dual Knowledge Distillation and Target-oriented Vision Modeling for Many-to-Many Multimodal Summarization
Yunlong Liang | Fandong Meng | Jiaan Wang | Jinan Xu | Yufeng Chen | Jie Zhou
Findings of the Association for Computational Linguistics: EMNLP 2023

Many-to-many multimodal summarization (M³S) task aims to generate summaries in any language with document inputs in any language and the corresponding image sequence, which essentially comprises of multimodal monolingual summarization (MMS) and multimodal cross-lingual summarization (MXLS) tasks. Although much work has been devoted to either MMS or MXLS, little research pays attention to the M³S task. Besides, existing studies mainly focus on 1) utilizing MMS to enhance MXLS via knowledge distillation without considering the performance of MMS or 2) improving MMS models by filtering summary-unrelated visual features with implicit learning or explicitly complex training objectives. In this paper, we first introduce a general and practical task, i.e., M³S. Further, we propose a dual knowledge distillation and target-oriented vision modeling framework for the M³S task. Specifically, the dual knowledge distillation method guarantees that the knowledge of MMS and MXLS can be transferred to each other and thus mutually prompt both of them. To offer target-oriented visual features, a simple yet effective target-oriented contrastive objective is designed and responsible for discarding needless visual information. Extensive experiments on the many-to-many setting show the effectiveness of the proposed approach. Additionally, we contribute a many-to-many multimodal summarization (lmttM³Sum) dataset with 44 languages to facilitate future research.

Novel Slot Detection With an Incremental Setting
Chen Liang | Hongliang Li | Changhao Guan | Qingbin Liu | Jian Liu | Jinan Xu | Zhe Zhao
Findings of the Association for Computational Linguistics: EMNLP 2023

Current dialogue systems face diverse user requests and rapid change domains, making quickly adapt to scenarios with previous unseen slot types become a major challenge. Recently, researchers have introduced novel slot detection (NSD) to discover potential new types. However, dialogue system with NSD does not bring practical improvements due to the system still cannot handle novel slots in subsequent interactions. In this paper, we define incremental novel slot detection (INSD), which separates the dialogue system to deal with novel types as two major phrases: 1) model discovers unknown slots, 2) training model to possess the capability to handle new classes. We provide an effective model to extract novel slots with set prediction strategy and propose a query-enhanced approach to overcome catastrophic forgetting during the process of INSD. We construct two INSD datasets to evaluate our method and experimental results show that our approach exhibits superior performance.

Structure and Label Constrained Data Augmentation for Cross-domain Few-shot NER
Jingyi Zhang | Ying Zhang | Yufeng Chen | Jinan Xu
Findings of the Association for Computational Linguistics: EMNLP 2023

Cross-domain few-shot named entity recognition (NER) is a challenging task that aims to recognize entities in target domains with limited labeled data by leveraging relevant knowledge from source domains. However, domain gaps limit the effect of knowledge transfer and harm the performance of NER models. In this paper, we analyze those domain gaps from two new perspectives, i.e., entity annotations and entity structures and leverage word-to-tag and word-to-word relations to model them, respectively. Moreover, we propose a novel method called Structure and Label Constrained Data Augmentation (SLC-DA) for Cross-domain Few-shot NER, which novelly design a label constrained pre-train task and a structure constrained optimization objectives in the data augmentation process to generate domain-specific augmented data to help NER models smoothly transition from source to target domains. We evaluate our approach on several standard datasets and achieve state-of-the-art or competitive results, demonstrating the effectiveness of our method in cross-domain few-shot NER.

RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training
Chulun Zhou | Yunlong Liang | Fandong Meng | Jinan Xu | Jinsong Su | Jie Zhou
Findings of the Association for Computational Linguistics: ACL 2023

Multilingual vision-language (V&L) pre-training has achieved remarkable progress in learning universal representations across different modalities and languages. In spite of recent success, there still remain challenges limiting further improvements of V&L pre-trained models in multilingual settings. Particularly, current V&L pre-training methods rely heavily on strictly-aligned multilingual image-text pairs generated from English-centric datasets through machine translation. However, the cost of collecting and translating such strictly-aligned datasets is usually unbearable. In this paper, we propose Regularized Contrastive Cross-lingual Cross-modal (RC3) pre-training, which further exploits more abundant weakly-aligned multilingual image-text pairs. Specifically, we design a regularized cross-lingual visio-textual contrastive learning objective that constrains the representation proximity of weakly-aligned visio-textual inputs according to textual relevance. Besides, existing V&L pre-training approaches mainly deal with visual inputs by either region-of-interest (ROI) features or patch embeddings. We flexibly integrate the two forms of visual features into our model for pre-training and downstream multi-modal tasks. Extensive experiments on 5 downstream multi-modal tasks across 6 languages demonstrate the effectiveness of our proposed method over competitive contrast models with strong zero-shot capability.

A Multi-modal Debiasing Model with Dynamical Constraint for Robust Visual Question Answering
Yu Li | Bojie Hu | Fengshuo Zhang | Yahan Yu | Jian Liu | Yufeng Chen | Jinan Xu
Findings of the Association for Computational Linguistics: ACL 2023

Recent studies have pointed out that many well-developed Visual Question Answering (VQA) systems suffer from bias problem. Despite the remarkable performance gained on In-Distribution (ID) datasets, the VQA model might merely capture the superficial correlation from question to answer rather than showing real reasoning abilities. Therefore, when switching to Out-of-Distribution (OOD) dataset, whose test distribution is unknown or even reversed with the training set, significant drops might be demonstrated. Although efforts have been devoted to easing the negative bias effect brought by language prior and analysing its inherent cause, they are still limited by the following two aspects. First, most current debiasing methods achieve promising OOD generalization ability with a major sacrifice of the ID performance. Second, existing researches are restricted by exploiting comprehensive biases, since weakening the language bias is mainly focused, while only a few works consider vision bias. In this paper, we investigate a straightforward way to mitigate bias problem for VQA task. Specifically, we reduce bias effect by subtracting bias score from standard VQA base score. Based on such a direct strategy, we design two bias learning branches to detect more bias information, which are combined with a dynamical constraint loss to alleviate the problem of over-correction and insufficient debiasing effect. We evaluate our method on the challenging VQA v2.0 and VQA-CP V2,0 datasets and the proposed method achievessignificant improvement.

Addressing NER Annotation Noises with Uncertainty-Guided Tree-Structured CRFs
Jian Liu | Weichang Liu | Yufeng Chen | Jinan Xu | Zhe Zhao
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Real-world named entity recognition (NER) datasets are notorious for their noisy nature, attributed to annotation errors, inconsistencies, and subjective interpretations. Such noises present a substantial challenge for traditional supervised learning methods. In this paper, we present a new and unified approach to tackle annotation noises for NER. Our method considers NER as a constituency tree parsing problem, utilizing a tree-structured Conditional Random Fields (CRFs) with uncertainty evaluation for integration. Through extensive experiments conducted on four real-world datasets, we demonstrate the effectiveness of our model in addressing both partial and incorrect annotation errors. Remarkably, our model exhibits superb performance even in extreme scenarios with 90% annotation noise.

A Quality-based Syntactic Template Retriever for Syntactically-Controlled Paraphrase Generation
Xue Zhang | Songming Zhang | Yunlong Liang | Yufeng Chen | Jian Liu | Wenjuan Han | Jinan Xu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Existing syntactically-controlled paraphrase generation (SPG) models perform promisingly with human-annotated or well-chosen syntactic templates. However, the difficulty of obtaining such templates actually hinders the practical application of SPG models. For one thing, the prohibitive cost makes it unfeasible to manually design decent templates for every source sentence. For another, the templates automatically retrieved by current heuristic methods are usually unreliable for SPG models to generate qualified paraphrases. To escape this dilemma, we propose a novel Quality-based Syntactic Template Retriever (QSTR) to retrieve templates based on the quality of the to-be-generated paraphrases. Furthermore, for situations requiring multiple paraphrases for each source sentence, we design a Diverse Templates Search (DTS) algorithm, which can enhance the diversity between paraphrases without sacrificing quality. Experiments demonstrate that QSTR can significantly surpass existing retrieval methods in generating high-quality paraphrases and even perform comparably with human-annotated templates in terms of reference-free metrics. Additionally, human evaluation and the performance on downstream tasks using our generated paraphrases for data augmentation showcase the potential of our QSTR and DTS algorithm in practical scenarios.

MT2: Towards a Multi-Task Machine Translation Model with Translation-Specific In-Context Learning
Chunyou Li | Mingtong Liu | Hongxiao Zhang | Yufeng Chen | Jinan Xu | Ming Zhou
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Sentence-level translation, document-level translation, translation memory, and terminology constrained translation play an important role in machine translation. Most of the previous work uses separate models or methods to solve these tasks, which is not conducive to knowledge transfer of different tasks and increases the complexity of system construction. In this work, we explore the potential of pre-trained language model in machine translation tasks and propose a Multi-Task Machine Translation (MT2) model to integrate these translation tasks. We design a novel translation-specific In-Context Learning (ICL) paradigm for model training, in which all of the translation tasks can be modeled as context-learning tasks that integrate contextual information for performance improvement. Specifically, we propose a retrieval and alignment method to obtain a large scale context-enhancement training data, then we train the model in an in-context learning manner. Furthermore, we adopt two context-dependent training strategies to encourage the model to better understand and utilize contextual information for translation. Extensive experiments on translation memory, terminology constrained translation, document-level translation, and few-shot domain-adaptation tasks demonstrate the superior performance of our model, verifying the effectiveness of our proposed approach.

Improving Zero-shot Cross-lingual Dialogue State Tracking via Contrastive Learning
Yu Xiang | Ting Zhang | Hui Di | Hui Huang | Chunyou Li | Kazushige Ouchi | Yufeng Chen | Jinan Xu
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“Recent works in dialogue state tracking (DST) focus on a handful of languages, as collectinglarge-scale manually annotated data in different languages is expensive. Existing models addressthis issue by code-switched data augmentation or intermediate fine-tuning of multilingual pre-trained models. However, these models can only perform implicit alignment across languages. In this paper, we propose a novel model named Contrastive Learning for Cross-Lingual DST(CLCL-DST) to enhance zero-shot cross-lingual adaptation. Specifically, we use a self-builtbilingual dictionary for lexical substitution to construct multilingual views of the same utterance. Then our approach leverages fine-grained contrastive learning to encourage representations ofspecific slot tokens in different views to be more similar than negative example pairs. By thismeans, CLCL-DST aligns similar words across languages into a more refined language-invariantspace. In addition, CLCL-DST uses a significance-based keyword extraction approach to selecttask-related words to build the bilingual dictionary for better cross-lingual positive examples. Experiment results on Multilingual WoZ 2.0 and parallel MultiWoZ 2.1 datasets show that ourproposed CLCL-DST outperforms existing state-of-the-art methods by a large margin, demon-strating the effectiveness of CLCL-DST.”

A Holistic Approach to Reference-Free Evaluation of Machine Translation
Hanming Wu | Wenjuan Han | Hui Di | Yufeng Chen | Jinan Xu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Traditional machine translation evaluation relies on reference written by humans. While reference-free evaluation gets rid of the constraints of labor-intensive annotations, which can pivot easily to new domains and is more scalable. In this paper, we propose a reference-free evaluation approach that characterizes evaluation as two aspects: (1) fluency: how well the translated text conforms to normal human language usage; (2) faithfulness: how well the translated text reflects the source data. We further split the faithfulness into word-level and sentence-level. Extensive experiments spanning WMT18/19/21 Metrics segment-level daRR and MQM datasets demonstrate that our proposed reference-free approach, ReFreeEval, outperforms SOTA reference-fee metrics like YiSi-2.

Document-Level Event Argument Extraction With a Chain Reasoning Paradigm
Jian Liu | Chen Liang | Jinan Xu | Haoyan Liu | Zhe Zhao
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Document-level event argument extraction aims to identify event arguments beyond sentence level, where a significant challenge is to model long-range dependencies. Focusing on this challenge, we present a new chain reasoning paradigm for the task, which can generate decomposable first-order logic rules for reasoning. This paradigm naturally captures long-range interdependence due to the chains’ compositional nature, which also improves interpretability by explicitly modeling the reasoning process. We introduce T-norm fuzzy logic for optimization, which permits end-to-end learning and shows promise for integrating the expressiveness of logical reasoning with the generalization of neural networks. In experiments, we show that our approach outperforms previous methods by a significant margin on two standard benchmarks (over 6 points in F1).Moreover, it is data-efficient in low-resource scenarios and robust enough to defend against adversarial attacks.

Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation
Songming Zhang | Yunlong Liang | Shuaibo Wang | Yufeng Chen | Wenjuan Han | Jian Liu | Jinan Xu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Knowledge distillation (KD) is a promising technique for model compression in neural machine translation. However, where the knowledge hides in KD is still not clear, which may hinder the development of KD. In this work, we first unravel this mystery from an empirical perspective and show that the knowledge comes from the top-1 predictions of teachers, which also helps us build a potential connection between word- and sequence-level KD. Further, we point out two inherent issues in vanilla word-level KD based on this finding. Firstly, the current objective of KD spreads its focus to whole distributions to learn the knowledge, yet lacks special treatment on the most crucial top-1 information. Secondly, the knowledge is largely covered by the golden information due to the fact that most top-1 predictions of teachers overlap with ground-truth tokens, which further restricts the potential of KD. To address these issues, we propose a new method named Top-1 Information Enhanced Knowledge Distillation (TIE-KD). Specifically, we design a hierarchical ranking loss to enforce the learning of the top-1 information from the teacher. Additionally, we develop an iterative KD procedure to infuse more additional knowledge by distilling on the data without ground-truth targets. Experiments on WMT’14 English-German, WMT’14 English-French and WMT’16 English-Romanian demonstrate that our method can respectively boost Transformer_base students by +1.04, +0.60 and +1.11 BLEU scores and significantly outperforms the vanilla word-level KD baseline. Besides, our method shows higher generalizability on different teacher-student capacity gaps than existing KD techniques.

Summary-Oriented Vision Modeling for Multimodal Abstractive Summarization
Yunlong Liang | Fandong Meng | Jinan Xu | Jiaan Wang | Yufeng Chen | Jie Zhou
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The goal of multimodal abstractive summarization (MAS) is to produce a concise summary given the multimodal data (text and vision). Existing studies on MAS mainly focus on how to effectively use the extracted visual features, having achieved impressive success on the high-resource English dataset. However, less attention has been paid to the quality of the visual features to the summary, which may limit the model performance, especially in the low- and zero-resource scenarios. In this paper, we propose to improve the summary quality through summary-oriented visual features. To this end, we devise two auxiliary tasks including vision to summary task and masked image modeling task. Together with the main summarization task, we optimize the MAS model via the training objectives of all these tasks. By these means, the MAS model can be enhanced by capturing the summary-oriented visual features, thereby yielding more accurate summaries. Experiments on 44 languages, covering mid-high-, low-, and zero-resource scenarios, verify the effectiveness and superiority of the proposed approach, which achieves state-of-the-art performance under all scenarios. Additionally, we will contribute a large-scale multilingual multimodal abstractive summarization (MM-Sum) dataset to the research community.

2022

BJTU-WeChat’s Systems for the WMT22 Chat Translation Task
Yunlong Liang | Fandong Meng | Jinan Xu | Yufeng Chen | Jie Zhou
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT’22 chat translation task for English-German. Based on the Transformer, we apply several effective variants. In our experiments, we apply the pre-training-then-fine-tuning paradigm. In the first pre-training stage, we employ data filtering and synthetic data generation (i.e., back-translation, forward-translation, and knowledge distillation). In the second fine-tuning stage, we investigate speaker-aware in-domain data generation, speaker adaptation, prompt-based context modeling, target denoising fine-tuning, and boosted self-COMET-based model ensemble. Our systems achieve 81.0 and 94.6 COMET scores on English-German and German-English, respectively. The COMET scores of English-German and German-English are the highest among all submissions.

BJTU-Toshiba’s Submission to WMT22 Quality Estimation Shared Task
Hui Huang | Hui Di | Chunyou Li | Hanming Wu | Kazushige Ouchi | Yufeng Chen | Jian Liu | Jinan Xu
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper presents the BJTU-Toshiba joint submission for WMT 2022 quality estimation shared task. We only participate in Task 1 (quality prediction) of the shared task, focusing on the sentence-level MQM prediction. The techniques we experimented with include the integration of monolingual language models and the pre-finetuning of pre-trained representations. We tried two styles of pre-finetuning, namely Translation Language Modeling and Replaced Token Detection. We demonstrate the competitiveness of our system compared to the widely adopted XLM-RoBERTa baseline. Our system is also the top-ranking system on the Sentence-level MQM Prediction for the English-German language pairs.

Improved Data Augmentation for Translation Suggestion
Hongxiao Zhang | Siyu Lai | Songming Zhang | Hui Huang | Yufeng Chen | Jinan Xu | Jian Liu
Proceedings of the Seventh Conference on Machine Translation (WMT)

Translation suggestion (TS) models are used to automatically provide alternative suggestions for incorrect spans in sentences generated by machine translation. This paper introduces the system used in our submission to the WMT’22 Translation Suggestion shared task. Our system is based on the ensemble of different translation architectures, including Transformer, SA-Transformer, and DynamicConv. We use three strategies to construct synthetic data from parallel corpora to compensate for the lack of supervised data. In addition, we introduce a multi-phase pre-training strategy, adding an additional pre-training phase with in-domain data. We rank second and third on the English-German and English-Chinese bidirectional tasks, respectively.

Generating Authentic Adversarial Examples beyond Meaning-preserving with Doubly Round-trip Translation
Siyu Lai | Zhen Yang | Fandong Meng | Xue Zhang | Yufeng Chen | Jinan Xu | Jie Zhou
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Generating adversarial examples for Neural Machine Translation (NMT) with single Round-Trip Translation (RTT) has achieved promising results by releasing the meaning-preserving restriction. However, a potential pitfall for this approach is that we cannot decide whether the generated examples are adversarial to the target NMT model or the auxiliary backward one, as the reconstruction error through the RTT can be related to either. To remedy this problem, we propose a new definition for NMT adversarial examples based on the Doubly Round-Trip Translation (DRTT). Specifically, apart from the source-target-source RTT, we also consider the target-source-target one, which is utilized to pick out the authentic adversarial examples for the target NMT model. Additionally, to enhance the robustness of the NMT model, we introduce the masked language models to construct bilingual adversarial pairs based on DRTT, which are used to train the NMT model directly. Extensive experiments on both the clean and noisy test sets (including the artificial and natural noise) show that our approach substantially improves the robustness of NMT models.

Learning Structural Information for Syntax-Controlled Paraphrase Generation
Erguang Yang | Chenglin Bai | Deyi Xiong | Yujie Zhang | Yao Meng | Jinan Xu | Yufeng Chen
Findings of the Association for Computational Linguistics: NAACL 2022

Syntax-controlled paraphrase generation aims to produce paraphrase conform to given syntactic patterns. To address this task, recent works have started to use parse trees (or syntactic templates) to guide generation.A constituency parse tree contains abundant structural information, such as parent-child relation, sibling relation, and the alignment relation between words and nodes. Previous works have only utilized parent-child and alignment relations, which may affect the generation quality. To address this limitation, we propose a Structural Information-augmented Syntax-Controlled Paraphrasing (SI-SCP) model. Particularly, we design a syntax encoder based on tree-transformer to capture parent-child and sibling relations. To model the alignment relation between words and nodes, we propose an attention regularization objective, which makes the decoder accurately select corresponding syntax nodes to guide the generation of words. Experiments show that SI-SCP achieves state-of-the-art performances in terms of semantic and syntactic quality on two popular benchmark datasets. Additionally, we propose a Syntactic Template Retriever (STR) to retrieve compatible syntactic structures. We validate that STR is capable of retrieving compatible syntactic structures. We further demonstrate the effectiveness of SI-SCP to generate diverse paraphrases with retrieved syntactic structures.

Long Text Generation with Topic-aware Discrete Latent Variable Model
Erguang Yang | Mingtong Liu | Deyi Xiong | Yujie Zhang | Yufeng Chen | Jinan Xu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Generating coherent long texts is an important yet challenging task, particularly forthe open-ended generation. Prior work based on discrete latent codes focuses on the modeling of discourse relation, resulting in discrete codes only learning shallow semantics (Ji and Huang, 2021). A natural text always revolves around several related topics and the transition across them is natural and smooth.In this work, we investigate whether discrete latent codes can learn information of topics. To this end, we build a topic-aware latent code-guided text generation model. To encourage discrete codes to model information about topics, we propose a span-level bag-of-words training objective for the model. Automatic and manual evaluation experiments show that our method can generate more topic-relevant and coherent texts.

Cross-Align: Modeling Deep Cross-lingual Interactions for Word Alignment
Siyu Lai | Zhen Yang | Fandong Meng | Yufeng Chen | Jinan Xu | Jie Zhou
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Word alignment which aims to extract lexicon translation equivalents between source and target sentences, serves as a fundamental tool for natural language processing. Recent studies in this area have yielded substantial improvements by generating alignments from contextualized embeddings of the pre-trained multilingual language models. However, we find that the existing approaches capture few interactions between the input sentence pairs, which degrades the word alignment quality severely, especially for the ambiguous words in the monolingual context. To remedy this problem, we propose Cross-Align to model deep interactions between the input sentence pairs, in which the source and target sentences are encoded separately with the shared self-attention modules in the shallow layers, while cross-lingual interactions are explicitly constructed by the cross-attention modules in the upper layers. Besides, to train our model effectively, we propose a two-stage training framework, where the model is trained with a simple Translation Language Modeling (TLM) objective in the first stage and then finetuned with a self-supervised alignment objective in the second stage. Experiments show that the proposed Cross-Align achieves the state-of-the-art (SOTA) performance on four out of five language pairs.

Iterative Constrained Back-Translation for Unsupervised Domain Adaptation of Machine Translation
Hongxiao Zhang | Hui Huang | Jiale Gao | Yufeng Chen | Jinan Xu | Jian Liu
Proceedings of the 29th International Conference on Computational Linguistics

Back-translation has been proven to be effective in unsupervised domain adaptation of neural machine translation (NMT). However, the existing back-translation methods mainly improve domain adaptability by generating in-domain pseudo-parallel data that contains sentence-structural knowledge, paying less attention to the in-domain lexical knowledge, which may lead to poor translation of unseen in-domain words. In this paper, we propose an Iterative Constrained Back-Translation (ICBT) method to incorporate in-domain lexical knowledge on the basis of BT for unsupervised domain adaptation of NMT. Specifically, we apply lexical constraints into back-translation to generate pseudo-parallel data with in-domain lexical knowledge, and then perform round-trip iterations to incorporate more lexical knowledge. Based on this, we further explore sampling strategies of constrained words in ICBT to introduce more targeted lexical knowledge, via domain specificity and confidence estimation. Experimental results on four domains show that our approach achieves state-of-the-art results, improving the BLEU score by up to 3.08 compared to the strongest baseline, which demonstrates the effectiveness of our approach.

Supervised Contrastive Learning for Cross-lingual Transfer Learning
Wang Shuaibo | Di Hui | Huang Hui | Lai Siyu | Ouchi Kazushige | Chen Yufeng | Xu Jinan
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“Multilingual pre-trained representations are not well-aligned by nature, which harms their performance on cross-lingual tasks. Previous methods propose to post-align the multilingual pretrained representations by multi-view alignment or contrastive learning. However, we argue that both methods are not suitable for the cross-lingual classification objective, and in this paper we propose a simple yet effective method to better align the pre-trained representations. On the basis of cross-lingual data augmentations, we make a minor modification to the canonical contrastive loss, to remove false-negative examples which should not be contrasted. Augmentations with the same class are brought close to the anchor sample, and augmentations with different class are pushed apart. Experiment results on three cross-lingual tasks from XTREME benchmark show our method could improve the transfer performance by a large margin with no additional resource needed. We also provide in-detail analysis and comparison between different post-alignment strategies.”

Towards Making the Most of Pre-trained Translation Model for Quality Estimation
Li Chunyou | Di Hui | Huang Hui | Ouchi Kazushige | Chen Yufeng | Liu Jian | Xu Jinan
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“Machine translation quality estimation (QE) aims to evaluate the quality of machine translation automatically without relying on any reference. One common practice is applying the translation model as a feature extractor. However, there exist several discrepancies between the translation model and the QE model. The translation model is trained in an autoregressive manner, while the QE model is performed in a non-autoregressive manner. Besides, the translation model only learns to model human-crafted parallel data, while the QE model needs to model machinetranslated noisy data. In order to bridge these discrepancies, we propose two strategies to posttrain the translation model, namely Conditional Masked Language Modeling (CMLM) and Denoising Restoration (DR). Specifically, CMLM learns to predict masked tokens at the target side conditioned on the source sentence. DR firstly introduces noise to the target side of parallel data, and the model is trained to detect and recover the introduced noise. Both strategies can adapt the pre-trained translation model to the QE-style prediction task. Experimental results show that our model achieves impressive results, significantly outperforming the baseline model, verifying the effectiveness of our proposed methods.”

Saliency as Evidence: Event Detection with Trigger Saliency Attribution
Jian Liu | Yufeng Chen | Jinan Xu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Event detection (ED) is a critical subtask of event extraction that seeks to identify event triggers of certain types in texts. Despite significant advances in ED, existing methods typically follow a “one model fits all types” approach, which sees no differences between event types and often results in a quite skewed performance. Finding the causes of skewed performance is crucial for the robustness of an ED model, but to date there has been little exploration of this problem. This research examines the issue in depth and presents a new concept termed trigger salience attribution, which can explicitly quantify the underlying patterns of events. On this foundation, we develop a new training mechanism for ED, which can distinguish between trigger-dependent and context-dependent types and achieve promising performance on two benchmarks. Finally, by highlighting many distinct characteristics of trigger-dependent and context-dependent types, our work may promote more research into this problem.

Scheduled Multi-task Learning for Neural Chat Translation
Yunlong Liang | Fandong Meng | Jinan Xu | Yufeng Chen | Jie Zhou
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Neural Chat Translation (NCT) aims to translate conversational text into different languages. Existing methods mainly focus on modeling the bilingual dialogue characteristics (e.g., coherence) to improve chat translation via multi-task learning on small-scale chat translation data. Although the NCT models have achieved impressive success, it is still far from satisfactory due to insufficient chat translation data and simple joint training manners. To address the above issues, we propose a scheduled multi-task learning framework for NCT. Specifically, we devise a three-stage training framework to incorporate the large-scale in-domain chat translation data into training by adding a second pre-training stage between the original pre-training and fine-tuning stages. Further, we investigate where and how to schedule the dialogue-related auxiliary tasks in multiple training stages to effectively enhance the main chat translation task. Extensive experiments on four language directions (English-Chinese and English-German) verify the effectiveness and superiority of the proposed approach. Additionally, we will make the large-scale in-domain paired bilingual dialogue dataset publicly available for the research community.

MSCTD: A Multimodal Sentiment Chat Translation Dataset
Yunlong Liang | Fandong Meng | Jinan Xu | Yufeng Chen | Jie Zhou
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Multimodal machine translation and textual chat translation have received considerable attention in recent years. Although the conversation in its natural form is usually multimodal, there still lacks work on multimodal machine translation in conversations. In this work, we introduce a new task named Multimodal Chat Translation (MCT), aiming to generate more accurate translations with the help of the associated dialogue history and visual context. To this end, we firstly construct a Multimodal Sentiment Chat Translation Dataset (MSCTD) containing 142,871 English-Chinese utterance pairs in 14,762 bilingual dialogues. Each utterance pair, corresponding to the visual context that reflects the current conversational scene, is annotated with a sentiment label. Then, we benchmark the task by establishing multiple baseline systems that incorporate multimodal and sentiment features for MCT. Preliminary experiments on two language directions (English-Chinese) verify the potential of contextual and multimodal information fusion and the positive impact of sentiment on the MCT task. Additionally, we provide a new benchmark on multimodal dialogue sentiment analysis with the constructed MSCTD. Our work can facilitate researches on both multimodal chat translation and multimodal dialogue sentiment analysis.

Conditional Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation
Songming Zhang | Yijin Liu | Fandong Meng | Yufeng Chen | Jinan Xu | Jian Liu | Jie Zhou
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Token-level adaptive training approaches can alleviate the token imbalance problem and thus improve neural machine translation, through re-weighting the losses of different target tokens based on specific statistical metrics (e.g., token frequency or mutual information). Given that standard translation models make predictions on the condition of previous target contexts, we argue that the above statistical metrics ignore target context information and may assign inappropriate weights to target tokens. While one possible solution is to directly take target contexts into these statistical metrics, the target-context-aware statistical computing is extremely expensive, and the corresponding storage overhead is unrealistic. To solve the above issues, we propose a target-context-aware metric, named conditional bilingual mutual information (CBMI), which makes it feasible to supplement target context information for statistical metrics. Particularly, our CBMI can be formalized as the log quotient of the translation model probability and language model probability by decomposing the conditional joint distribution. Thus CBMI can be efficiently calculated during model training without any pre-specific statistical calculations and large storage overhead. Furthermore, we propose an effective adaptive training approach based on both the token- and sentence-level CBMI. Experimental results on WMT14 English-German and WMT19 Chinese-English tasks show our approach can significantly outperform the Transformer baseline and other related methods.

A Variational Hierarchical Model for Neural Cross-Lingual Summarization
Yunlong Liang | Fandong Meng | Chulun Zhou | Jinan Xu | Yufeng Chen | Jinsong Su | Jie Zhou
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The goal of the cross-lingual summarization (CLS) is to convert a document in one language (e.g., English) to a summary in another one (e.g., Chinese). The CLS task is essentially the combination of machine translation (MT) and monolingual summarization (MS), and thus there exists the hierarchical relationship between MT&MS and CLS. Existing studies on CLS mainly focus on utilizing pipeline methods or jointly training an end-to-end model through an auxiliary MT or MS objective. However, it is very challenging for the model to directly conduct CLS as it requires both the abilities to translate and summarize. To address this issue, we propose a hierarchical model for the CLS task, based on the conditional variational auto-encoder. The hierarchical model contains two kinds of latent variables at the local and global levels, respectively. At the local level, there are two latent variables, one for translation and the other for summarization. As for the global level, there is another latent variable for cross-lingual summarization conditioned on the two local-level variables. Experiments on two language directions (English-Chinese) verify the effectiveness and superiority of the proposed approach. In addition, we show that our model is able to generate better cross-lingual summaries than comparison models in the few-shot setting.

Adversarially Improving NMT Robustness to ASR Errors with Confusion Sets
Shuaibo Wang | Yufeng Chen | Songming Zhang | Deyi Xiong | Jinan Xu
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Neural machine translation (NMT) models are known to be fragile to noisy inputs from automatic speech recognition (ASR) systems. Existing methods are usually tailored for robustness against only homophone errors which account for a small portion of realistic ASR errors. In this paper, we propose an adversarial example generation method based on confusion sets that contain words easily confusable with a target word by ASR to conduct adversarial training for NMT models. Specifically, an adversarial example is generated from the perspective of acoustic relations instead of the traditional uniform or unigram sampling from the confusion sets. Experiments on different test sets with hand-crafted and real-world noise demonstrate the effectiveness of our method over previous methods. Moreover, our approach can achieve improvements on the clean test set.

2021

TenTrans Multilingual Low-Resource Translation System for WMT21 Indo-European Languages Task
Han Yang | Bojie Hu | Wanying Xie | Ambyera Han | Pan Liu | Jinan Xu | Qi Ju
Proceedings of the Sixth Conference on Machine Translation

This paper describes TenTrans’ submission to WMT21 Multilingual Low-Resource Translation shared task for the Romance language pairs. This task focuses on improving translation quality from Catalan to Occitan, Romanian and Italian, with the assistance of related high-resource languages. We mainly utilize back-translation, pivot-based methods, multilingual models, pre-trained model fine-tuning, and in-domain knowledge transfer to improve the translation quality. On the test set, our best-submitted system achieves an average of 43.45 case-sensitive BLEU scores across all low-resource pairs. Our data, code, and pre-trained models used in this work are available in TenTrans evaluation examples.

WeChat Neural Machine Translation Systems for WMT21
Xianfeng Zeng | Yijin Liu | Ernan Li | Qiu Ran | Fandong Meng | Peng Li | Jinan Xu | Jie Zhou
Proceedings of the Sixth Conference on Machine Translation

This paper introduces WeChat AI’s participation in WMT 2021 shared news translation task on English->Chinese, English->Japanese, Japanese->English and English->German. Our systems are based on the Transformer (Vaswani et al., 2017) with several novel and effective variants. In our experiments, we employ data filtering, large-scale synthetic data generation (i.e., back-translation, knowledge distillation, forward-translation, iterative in-domain knowledge transfer), advanced finetuning approaches, and boosted Self-BLEU based model ensemble. Our constrained systems achieve 36.9, 46.9, 27.8 and 31.3 case-sensitive BLEU scores on English->Chinese, English->Japanese, Japanese->English and English->German, respectively. The BLEU scores of English->Chinese, English->Japanese and Japanese->English are the highest among all submissions, and that of English->German is the highest among all constrained submissions.

Saliency-based Multi-View Mixed Language Training for Zero-shot Cross-lingual Classification
Siyu Lai | Hui Huang | Dong Jing | Yufeng Chen | Jinan Xu | Jian Liu
Findings of the Association for Computational Linguistics: EMNLP 2021

Recent multilingual pre-trained models, like XLM-RoBERTa (XLM-R), have been demonstrated effective in many cross-lingual tasks. However, there are still gaps between the contextualized representations of similar words in different languages. To solve this problem, we propose a novel framework named Multi-View Mixed Language Training (MVMLT), which leverages code-switched data with multi-view learning to fine-tune XLM-R. MVMLT uses gradient-based saliency to extract keywords which are the most relevant to downstream tasks and replaces them with the corresponding words in the target language dynamically. Furthermore, MVMLT utilizes multi-view learning to encourage contextualized embeddings to align into a more refined language-invariant space. Extensive experiments with four languages show that our model achieves state-of-the-art results on zero-shot cross-lingual sentiment classification and dialogue state tracking tasks, demonstrating the effectiveness of our proposed model.

Syntactically Diverse Adversarial Network for Knowledge-Grounded Conversation Generation
Fuwei Cui | Hui Di | Hongjie Ren | Kazushige Ouchi | Ze Liu | Jinan Xu
Findings of the Association for Computational Linguistics: EMNLP 2021

Generative conversation systems tend to produce meaningless and generic responses, which significantly reduce the user experience. In order to generate informative and diverse responses, recent studies proposed to fuse knowledge to improve informativeness and adopt latent variables to enhance the diversity. However, utilizing latent variables will lead to the inaccuracy of knowledge in the responses, and the dissemination of wrong knowledge will mislead the communicators. To address this problem, we propose a Syntactically Diverse Adversarial Network (SDAN) for knowledge-grounded conversation model. SDAN contains an adversarial hierarchical semantic network to keep the semantic coherence, a knowledge-aware network to attend more related knowledge for improving the informativeness and a syntactic latent variable network to generate syntactically diverse responses. Additionally, in order to increase the controllability of syntax, we adopt adversarial learning to decouple semantic and syntactic representations. Experimental results show that our model can not only generate syntactically diverse and knowledge-accurate responses but also significantly achieve the balance between improving the syntactic diversity and maintaining the knowledge accuracy.

An Iterative Multi-Knowledge Transfer Network for Aspect-Based Sentiment Analysis
Yunlong Liang | Fandong Meng | Jinchao Zhang | Yufeng Chen | Jinan Xu | Jie Zhou
Findings of the Association for Computational Linguistics: EMNLP 2021

Aspect-based sentiment analysis (ABSA) mainly involves three subtasks: aspect term extraction, opinion term extraction, and aspect-level sentiment classification, which are typically handled in a separate or joint manner. However, previous approaches do not well exploit the interactive relations among three subtasks and do not pertinently leverage the easily available document-level labeled domain/sentiment knowledge, which restricts their performances. To address these issues, we propose a novel Iterative Multi-Knowledge Transfer Network (IMKTN) for end-to-end ABSA. For one thing, through the interactive correlations between the ABSA subtasks, our IMKTN transfers the task-specific knowledge from any two of the three subtasks to another one at the token level by utilizing a well-designed routing algorithm, that is, any two of the three subtasks will help the third one. For another, our IMKTN pertinently transfers the document-level knowledge, i.e., domain-specific and sentiment-related knowledge, to the aspect-level subtasks to further enhance the corresponding performance. Experimental results on three benchmark datasets demonstrate the effectiveness and superiority of our approach.

Confidence-Aware Scheduled Sampling for Neural Machine Translation
Yijin Liu | Fandong Meng | Yufeng Chen | Jinan Xu | Jie Zhou
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Target-oriented Fine-tuning for Zero-Resource Named Entity Recognition
Ying Zhang | Fandong Meng | Yufeng Chen | Jinan Xu | Jie Zhou
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Towards Making the Most of Dialogue Characteristics for Neural Chat Translation
Yunlong Liang | Chulun Zhou | Fandong Meng | Jinan Xu | Yufeng Chen | Jinsong Su | Jie Zhou
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Neural Chat Translation (NCT) aims to translate conversational text between speakers of different languages. Despite the promising performance of sentence-level and context-aware neural machine translation models, there still remain limitations in current NCT models because the inherent dialogue characteristics of chat, such as dialogue coherence and speaker personality, are neglected. In this paper, we propose to promote the chat translation by introducing the modeling of dialogue characteristics into the NCT model. To this end, we design four auxiliary tasks including monolingual response generation, cross-lingual response generation, next utterance discrimination, and speaker identification. Together with the main chat translation task, we optimize the enhanced NCT model through the training objectives of all these tasks. By this means, the NCT model can be enhanced by capturing the inherent dialogue characteristics, thus generating more coherent and speaker-relevant translations. Comprehensive experiments on four language directions (English<->German and English<->Chinese) verify the effectiveness and superiority of the proposed approach.

Scheduled Sampling Based on Decoding Steps for Neural Machine Translation
Yijin Liu | Fandong Meng | Yufeng Chen | Jinan Xu | Jie Zhou
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Scheduled sampling is widely used to mitigate the exposure bias problem for neural machine translation. Its core motivation is to simulate the inference scene during training by replacing ground-truth tokens with predicted tokens, thus bridging the gap between training and inference. However, vanilla scheduled sampling is merely based on training steps and equally treats all decoding steps. Namely, it simulates an inference scene with uniform error rates, which disobeys the real inference scene, where larger decoding steps usually have higher error rates due to error accumulations. To alleviate the above discrepancy, we propose scheduled sampling methods based on decoding steps, increasing the selection chance of predicted tokens with the growth of decoding steps. Consequently, we can more realistically simulate the inference scene during training, thus better bridging the gap between training and inference. Moreover, we investigate scheduled sampling based on both training steps and decoding steps for further improvements. Experimentally, our approaches significantly outperform the Transformer baseline and vanilla scheduled sampling on three large-scale WMT tasks. Additionally, our approaches also generalize well to the text summarization task on two popular benchmarks.

Machine Reading Comprehension as Data Augmentation: A Case Study on Implicit Event Argument Extraction
Jian Liu | Yufeng Chen | Jinan Xu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Implicit event argument extraction (EAE) is a crucial document-level information extraction task that aims to identify event arguments beyond the sentence level. Despite many efforts for this task, the lack of enough training data has long impeded the study. In this paper, we take a new perspective to address the data sparsity issue faced by implicit EAE, by bridging the task with machine reading comprehension (MRC). Particularly, we devise two data augmentation regimes via MRC, including: 1) implicit knowledge transfer, which enables knowledge transfer from other tasks, by building a unified training framework in the MRC formulation, and 2) explicit data augmentation, which can explicitly generate new training examples, by treating MRC models as an annotator. The extensive experiments have justified the effectiveness of our approach — it not only obtains state-of-the-art performance on two benchmarks, but also demonstrates superior results in a data-low scenario.

Syntactically-Informed Unsupervised Paraphrasing with Non-Parallel Data
Erguang Yang | Mingtong Liu | Deyi Xiong | Yujie Zhang | Yao Meng | Changjian Hu | Jinan Xu | Yufeng Chen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Previous works on syntactically controlled paraphrase generation heavily rely on large-scale parallel paraphrase data that is not easily available for many languages and domains. In this paper, we take this research direction to the extreme and investigate whether it is possible to learn syntactically controlled paraphrase generation with nonparallel data. We propose a syntactically-informed unsupervised paraphrasing model based on conditional variational auto-encoder (VAE) which can generate texts in a specified syntactic structure. Particularly, we design a two-stage learning method to effectively train the model using non-parallel data. The conditional VAE is trained to reconstruct the input sentence according to the given input and its syntactic structure. Furthermore, to improve the syntactic controllability and semantic consistency of the pre-trained conditional VAE, we fine-tune it using syntax controlling and cycle reconstruction learning objectives, and employ Gumbel-Softmax to combine these new learning objectives. Experiment results demonstrate that the proposed model trained only on non-parallel data is capable of generating diverse paraphrases with specified syntactic structure. Additionally, we validate the effectiveness of our method for generating syntactically adversarial examples on the sentiment analysis task.

Improving Entity Linking by Encoding Type Information into Entity Embeddings
Li Tianran | Yang Erguang | Zhang Yujie | Chen Yufeng | Xu Jinan
Proceedings of the 20th Chinese National Conference on Computational Linguistics

Entity Linking (EL) refers to the task of linking entity mentions in the text to the correct entities inthe Knowledge Base (KB) in which entity embeddings play a vital and challenging role because of the subtle differences between entities. However existing pre-trained entity embeddings onlylearn the underlying semantic information in texts yet the fine-grained entity type informationis ignored which causes the type of the linked entity is incompatible with the mention context. In order to solve this problem we propose to encode fine-grained type information into entity embeddings. We firstly pre-train word vectors to inject type information by embedding wordsand fine-grained entity types into the same vector space. Then we retrain entity embeddings withword vectors containing fine-grained type information. By applying our entity embeddings to twoexisting EL models our method respectively achieves 0.82% and 0.42% improvement on average F1 score of the test sets. Meanwhile our method is model-irrelevant which means it can helpother EL models.

LRRA:A Transparent Neural-Symbolic Reasoning Framework for Real-World Visual Question Answering
Wan Zhang | Chen Keming | Zhang Yujie | Xu Jinan | Chen Yufeng
Proceedings of the 20th Chinese National Conference on Computational Linguistics

The predominant approach of visual question answering (VQA) relies on encoding the imageand question with a ”black box” neural encoder and decoding a single token into answers suchas ”yes” or ”no”. Despite this approach’s strong quantitative results it struggles to come up withhuman-readable forms of justification for the prediction process. To address this insufficiency we propose LRRA[LookReadReasoningAnswer]a transparent neural-symbolic framework forvisual question answering that solves the complicated problem in the real world step-by-steplike humans and provides human-readable form of justification at each step. Specifically LRRAlearns to first convert an image into a scene graph and parse a question into multiple reasoning instructions. It then executes the reasoning instructions one at a time by traversing the scenegraph using a recurrent neural-symbolic execution module. Finally it generates answers to the given questions and makes corresponding marks on the image. Furthermore we believe that the relations between objects in the question is of great significance for obtaining the correct answerso we create a perturbed GQA test set by removing linguistic cues (attributes and relations) in the questions to analyze which part of the question contributes more to the answer. Our experimentson the GQA dataset show that LRRA is significantly better than the existing representative model(57.12% vs. 56.39%). Our experiments on the perturbed GQA test set show that the relations between objects is more important for answering complicated questions than the attributes ofobjects.Keywords:Visual Question Answering Relations Between Objects Neural-Symbolic Reason-ing.

融合外部知识的开放域复述模板获取方法(An Open Domain Paraphrasing Template Acquisition Method Based on External Knowledge)
Bo Jin (金波) | Mingtong Liu (刘明童) | Yujie Zhang (张玉洁) | Jinan Xu (徐金安) | Yufeng Chen (陈钰枫)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

如何挖掘语言资源中丰富的复述模板,是复述研究中的一项重要任务。已有方法在人工给定种子实体对的基础上,利用实体关系,通过自举迭代方式,从开放域获取复述模板,规避对平行语料或可比语料的依赖,但是该方法需人工给定实体对,实体关系受限;在迭代过程中语义会发生偏移,影响获取质量。针对这些问题,我们考虑知识库中包含描述特定语义关系的实体对(即关系三元组),提出融合外部知识的开放域复述模板自动获取方法。首先,将关系三元组与开放域文本对齐,获取关系对应文本,并将文本中语义丰富部分泛化成变量槽,获取关系模板;接着设计模板表示方法,本文利用预训练语言模型,在模板表示中融合变量槽语义;最后,根据获得的模板表示,设计自动聚类与筛选方法,获取高精度的复述模板。在融合自动评测与人工评测的评价方法下,实验结果表明,本文提出的方法实现了在开放域数据上复述模板的自动泛化与获取,能够获得质量高、语义一致的复述模板。

基于多任务标签一致性机制的中文命名实体识别(Chinese Named Entity Recognition based on Multi-task Label Consistency Mechanism)
Shuning Lv (吕书宁) | Jian Liu (刘健) | Jinan Xu (徐金安) | Yufeng Chen (陈钰枫) | Yujie Zhang (张玉洁)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

实体边界预测对中文命名实体识别至关重要。现有研究为改善边界识别效果提出的多任务学习方法仅考虑与分词任务结合,缺少多任务标签训练数据,无法学到任务的标签一致性关系。本文提出一种新的基于多任务标签一致性机制的中文命名实体识别方法:将分词和词性信息融入命名实体识别模型,使三种任务联合训练;建立基于标签一致性机制的多任务学习模式,来捕获标签一致性关系及学习多任务表示。全样本和小样本实验表明了方法的有效性。

Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising
Zhu Wenjing | Liu Jian | Xu Jinan | Chen Yufeng | Zhang Yujie
Proceedings of the 20th Chinese National Conference on Computational Linguistics

Deep neural networks have achieved state-of-the-art performances on named entity recognition(NER) with sufficient training data while they perform poorly in low-resource scenarios due to data scarcity. To solve this problem we propose a novel data augmentation method based on pre-trained language model (PLM) and curriculum learning strategy. Concretely we use the PLMto generate diverse training instances through predicting different masked words and design atask-specific curriculum learning strategy to alleviate the influence of noises. We evaluate the effectiveness of our approach on three datasets: CoNLL-2003 OntoNotes5.0 and MaScip of which the first two are simulated low-resource scenarios and the last one is a real low-resource dataset in material science domain. Experimental results show that our method consistently outperform the baseline model. Specifically our method achieves an absolute improvement of3.46% F1 score on the 1% CoNLL-2003 2.58% on the 1% OntoNotes5.0 and 0.99% on the full of MaScip.

Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation
Yangyifan Xu | Yijin Liu | Fandong Meng | Jiajun Zhang | Jinan Xu | Jie Zhou
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Recently, token-level adaptive training has achieved promising improvement in machine translation, where the cross-entropy loss function is adjusted by assigning different training weights to different tokens, in order to alleviate the token imbalance problem. However, previous approaches only use static word frequency information in the target language without considering the source language, which is insufficient for bilingual tasks like machine translation. In this paper, we propose a novel bilingual mutual information (BMI) based adaptive objective, which measures the learning difficulty for each target token from the perspective of bilingualism, and assigns an adaptive weight accordingly to improve token-level adaptive training. This method assigns larger training weights to tokens with higher BMI, so that easy tokens are updated with coarse granularity while difficult tokens are updated with fine granularity. Experimental results on WMT14 English-to-German and WMT19 Chinese-to-English demonstrate the superiority of our approach compared with the Transformer baseline and previous token-level adaptive training approaches. Further analyses confirm that our method can improve the lexical diversity.

Modeling Bilingual Conversational Characteristics for Neural Chat Translation
Yunlong Liang | Fandong Meng | Yufeng Chen | Jinan Xu | Jie Zhou
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Neural chat translation aims to translate bilingual conversational text, which has a broad application in international exchanges and cooperation. Despite the impressive performance of sentence-level and context-aware Neural Machine Translation (NMT), there still remain challenges to translate bilingual conversational text due to its inherent characteristics such as role preference, dialogue coherence, and translation consistency. In this paper, we aim to promote the translation quality of conversational text by modeling the above properties. Specifically, we design three latent variational modules to learn the distributions of bilingual conversational characteristics. Through sampling from these learned distributions, the latent variables, tailored for role preference, dialogue coherence, and translation consistency, are incorporated into the NMT model for better translation. We evaluate our approach on the benchmark dataset BConTrasT (English<->German) and a self-collected bilingual dialogue corpus, named BMELD (English<->Chinese). Extensive experiments show that our approach notably boosts the performance over strong baselines by a large margin and significantly surpasses some state-of-the-art context-aware NMT models in terms of BLEU and TER. Additionally, we make the BMELD dataset publicly available for the research community.

2020

A Learning-Exploring Method to Generate Diverse Paraphrases with Multi-Objective Deep Reinforcement Learning
Mingtong Liu | Erguang Yang | Deyi Xiong | Yujie Zhang | Yao Meng | Changjian Hu | Jinan Xu | Yufeng Chen
Proceedings of the 28th International Conference on Computational Linguistics

Paraphrase generation (PG) is of great importance to many downstream tasks in natural language processing. Diversity is an essential nature to PG for enhancing generalization capability and robustness of downstream applications. Recently, neural sequence-to-sequence (Seq2Seq) models have shown promising results in PG. However, traditional model training for PG focuses on optimizing model prediction against single reference and employs cross-entropy loss, which objective is unable to encourage model to generate diverse paraphrases. In this work, we present a novel approach with multi-objective learning to PG. We propose a learning-exploring method to generate sentences as learning objectives from the learned data distribution, and employ reinforcement learning to combine these new learning objectives for model training. We first design a sample-based algorithm to explore diverse sentences. Then we introduce several reward functions to evaluate the sampled sentences as learning signals in terms of expressive diversity and semantic fidelity, aiming to generate diverse and high-quality paraphrases. To effectively optimize model performance satisfying different evaluating aspects, we use a GradNorm-based algorithm that automatically balances these training objectives. Experiments and analyses on Quora and Twitter datasets demonstrate that our proposed method not only gains a significant increase in diversity but also improves generation quality over several state-of-the-art baselines.

A Joint Model for Graph-based Chinese Dependency Parsing
Xingchen Li | Mingtong Liu | Yujie Zhang | Jinan Xu | Yufeng Chen
Proceedings of the 19th Chinese National Conference on Computational Linguistics

In Chinese dependency parsing, the joint model of word segmentation, POS tagging and dependency parsing has become the mainstream framework because it can eliminate error propagation and share knowledge, where the transition-based model with feature templates maintains the best performance. Recently, the graph-based joint model (Yan et al., 2019) on word segmentation and dependency parsing has achieved better performance, demonstrating the advantages of the graph-based models. However, this work can not provide POS information for downstream tasks, and the POS tagging task was proved to be helpful to the dependency parsing according to the research of the transition-based model. Therefore, we propose a graph-based joint model for Chinese word segmentation, POS tagging and dependency parsing. We designed a charater-level POS tagging task, and then train it jointly with the model of Yan et al. (2019). We adopt two methods of joint POS tagging task, one is by sharing parameters, the other is by using tag attention mechanism, which enables the three tasks to better share intermediate information and improve each other’s performance. The experimental results on the Penn Chinese treebank (CTB5) show that our proposed joint model improved by 0.38% on dependency parsing than the model of Yan et al. (2019). Compared with the best transition-based joint model, our model improved by 0.18%, 0.35% and 5.99% respectively in terms of word segmentation, POS tagging and dependency parsing.

联合依存分析的汉语语义组合模型(Chinese Semantic Composition Model with Dependency Parsing)
Yuanmeng Chen (陈圆梦) | Yujie Zhang (张玉洁) | Jinan Xu (徐金安) | Yufeng Chen (陈钰枫)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

在语义组合方法中,结构化方法强调以结构信息指导词义表示的组合方式。现有结构化语义组合方法使用外部分析器获取句法结构信息,导致句法分析与语义组合相互割裂,句法分析的精度严重制约语义组合模型的性能,且训练数据领域不一致等问题会进一步加剧性能的下降。对此,本文提出联合依存分析的语义组合模型,将依存分析与语义组合进行联合,一方面在训练语义组合模型时对依存分析模型进行微调,使其能够更适应语义组合模型使用的训练数据的领域特点;另一方面,在语义组合部分加入依存分析的中间信息表示,获取更丰富的结构信息和语义信息,以此来降低语义组合模型对依存分析错误结果的敏感度,提升模型的鲁棒性。我们以汉语为具体研究对象,将语义组合模型用于复述识别任务,并在CTB5汉语依存分析数据和LCQMC汉语复述识别数据上验证本文提出的模型。实验结果显示,本文所提方法在复述识别任务上的预测正确率和F1值上分别达到76.81%和78.03%;我们进一步设计实验对联合学习和中间信息利用的有效性进行验证,并与相关代表性工作进行了对比分析。

基于图神经网络的汉语依存分析和语义组合计算联合模型(Joint Learning Chinese Dependency Parsing and Semantic Composition based on Graph Neural Network)
Kai Wang (汪凯) | Mingtong Liu (刘明童) | Yuanmeng Chen (陈圆梦) | Yujie Zhang (张玉洁) | Jinan Xu (徐金安) | Yufeng Chen (陈钰枫)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

组合原则表明句子的语义由其构成成分的语义按照一定规则组合而成, 由此基于句法结构的语义组合计算一直是一个重要的探索方向,其中采用树结构的组合计算方法最具有代表性。但是该方法难以应用于大规模数据处理,主要问题是其语义组合的顺序依赖于具体树的结构,无法实现并行处理。本文提出一种基于图的依存句法分析和语义组合计算的联合框架,并借助复述识别任务训练语义组合模型和句法分析模型。一方面图模型可以在训练和预测阶段采用并行处理,极大缩短计算时间;另一方面联合句法分析的语义组合框架不必依赖外部句法分析器,同时两个任务的联合学习可使语义表示同时学习句法结构和语义的上下文信息。我们在公开汉语复述识别数据集LCQMC上进行评测,实验结果显示准确率接近树结构组合方法,达到79.54%,而预测速度提升高达30倍。

Multi-view Classification Model for Knowledge Graph Completion
Wenbin Jiang | Mengfei Guo | Yufeng Chen | Ying Li | Jinan Xu | Yajuan Lyu | Yong Zhu
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Most previous work on knowledge graph completion conducted single-view prediction or calculation for candidate triple evaluation, based only on the content information of the candidate triples. This paper describes a novel multi-view classification model for knowledge graph completion, where multiple classification views are performed based on both content and context information for candidate triple evaluation. Each classification view evaluates the validity of a candidate triple from a specific viewpoint, based on the content information inside the candidate triple and the context information nearby the triple. These classification views are implemented by a unified neural network and the classification predictions are weightedly integrated to obtain the final evaluation. Experiments show that, the multi-view model brings very significant improvements over previous methods, and achieves the new state-of-the-art on two representative datasets. We believe that, the flexibility and the scalability of the multi-view classification model facilitates the introduction of additional information and resources for better performance.

2019

GCDT: A Global Context Enhanced Deep Transition Architecture for Sequence Labeling
Yijin Liu | Fandong Meng | Jinchao Zhang | Jinan Xu | Yufeng Chen | Jie Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Current state-of-the-art systems for sequence labeling are typically based on the family of Recurrent Neural Networks (RNNs). However, the shallow connections between consecutive hidden states of RNNs and insufficient modeling of global information restrict the potential performance of those models. In this paper, we try to address these issues, and thus propose a Global Context enhanced Deep Transition architecture for sequence labeling named GCDT. We deepen the state transition path at each position in a sentence, and further assign every token with a global representation learned from the entire sentence. Experiments on two standard sequence labeling tasks show that, given only training data and the ubiquitous word embeddings (Glove), our GCDT achieves 91.96 F1 on the CoNLL03 NER task and 95.43 F1 on the CoNLL2000 Chunking task, which outperforms the best reported results under the same settings. Furthermore, by leveraging BERT as an additional resource, we establish new state-of-the-art results with 93.47 F1 on NER and 97.30 F1 on Chunking.

A Novel Aspect-Guided Deep Transition Model for Aspect Based Sentiment Analysis
Yunlong Liang | Fandong Meng | Jinchao Zhang | Jinan Xu | Yufeng Chen | Jie Zhou
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Aspect based sentiment analysis (ABSA) aims to identify the sentiment polarity towards the given aspect in a sentence, while previous models typically exploit an aspect-independent (weakly associative) encoder for sentence representation generation. In this paper, we propose a novel Aspect-Guided Deep Transition model, named AGDT, which utilizes the given aspect to guide the sentence encoding from scratch with the specially-designed deep transition architecture. Furthermore, an aspect-oriented objective is designed to enforce AGDT to reconstruct the given aspect with the generated sentence representation. In doing so, our AGDT can accurately generate aspect-specific sentence representation, and thus conduct more accurate sentiment predictions. Experimental results on multiple SemEval datasets demonstrate the effectiveness of our proposed approach, which significantly outperforms the best reported results with the same setting.

Original Semantics-Oriented Attention and Deep Fusion Network for Sentence Matching
Mingtong Liu | Yujie Zhang | Jinan Xu | Yufeng Chen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Sentence matching is a key issue in natural language inference and paraphrase identification. Despite the recent progress on multi-layered neural network with cross sentence attention, one sentence learns attention to the intermediate representations of another sentence, which are propagated from preceding layers and therefore are uncertain and unstable for matching, particularly at the risk of error propagation. In this paper, we present an original semantics-oriented attention and deep fusion network (OSOA-DFN) for sentence matching. Unlike existing models, each attention layer of OSOA-DFN is oriented to the original semantic representation of another sentence, which captures the relevant information from a fixed matching target. The multiple attention layers allow one sentence to repeatedly read the important information of another sentence for better matching. We then additionally design deep fusion to propagate the attention information at each matching layer. At last, we introduce a self-attention mechanism to capture global context to enhance attention-aware representation within each sentence. Experiment results on three sentence matching benchmark datasets SNLI, SciTail and Quora show that OSOA-DFN has the ability to model sentence matching more precisely.

CM-Net: A Novel Collaborative Memory Network for Spoken Language Understanding
Yijin Liu | Fandong Meng | Jinchao Zhang | Jie Zhou | Yufeng Chen | Jinan Xu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Spoken Language Understanding (SLU) mainly involves two tasks, intent detection and slot filling, which are generally modeled jointly in existing works. However, most existing models fail to fully utilize cooccurrence relations between slots and intents, which restricts their potential performance. To address this issue, in this paper we propose a novel Collaborative Memory Network (CM-Net) based on the well-designed block, named CM-block. The CM-block firstly captures slot-specific and intent-specific features from memories in a collaborative manner, and then uses these enriched features to enhance local context representations, based on which the sequential information flow leads to more specific (slot and intent) global utterance representations. Through stacking multiple CM-blocks, our CM-Net is able to alternately perform information exchange among specific memories, local contexts and the global utterance, and thus incrementally enriches each other. We evaluate the CM-Net on two standard benchmarks (ATIS and SNIPS) and a self-collected corpus (CAIS). Experimental results show that the CM-Net achieves the state-of-the-art results on the ATIS and SNIPS in most of criteria, and significantly outperforms the baseline models on the CAIS. Additionally, we make the CAIS dataset publicly available for the research community.

2016

System Description of bjtu_nlp Neural Machine Translation System
Shaotong Li | JinAn Xu | Yufeng Chen | Yujie Zhang
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

This paper presents our machine translation system that developed for the WAT2016 evalua-tion tasks of ja-en, ja-zh, en-ja, zh-ja, JPCja-en, JPCja-zh, JPCen-ja, JPCzh-ja. We build our system based on encoder–decoder framework by integrating recurrent neural network (RNN) and gate recurrent unit (GRU), and we also adopt an attention mechanism for solving the problem of information loss. Additionally, we propose a simple translation-specific approach to resolve the unknown word translation problem. Experimental results show that our system performs better than the baseline statistical machine translation (SMT) systems in each task. Moreover, it shows that our proposed approach of unknown word translation performs effec-tively improvement of translation results.

Automatic Cross-Lingual Similarization of Dependency Grammars for Tree-based Machine Translation
Wenbin Jiang | Wen Zhang | Jinan Xu | Rangjia Cai
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

A Hybrid Transliteration Model for Chinese/English Named Entities —BJTU-NLP Report for the 5th Named Entities Workshop
Dandan Wang | Xiaohui Yang | Jinan Xu | Yufeng Chen | Nan Wang | Bojia Liu | Jian Yang | Yujie Zhang
Proceedings of the Fifth Named Entity Workshop

Integrating Case Frame into Japanese to Chinese Hierarchical Phrase-based Translation Model
Jinan Xu | Jiangming Liu | Yufeng Chen | Yujie Zhang | Fang Ming | Shaotong Li
Proceedings of the 1st Workshop on Semantics-Driven Statistical Machine Translation (S2MT 2015)

2014

System Description: Dependency-based Pre-ordering for Japanese-Chinese Machine Translation
Jingsheng Cai | Yujie Zhang | Hua Shan | Jinan Xu
Proceedings of the 1st Workshop on Asian Translation (WAT2014)

Augment Dependency-to-String Translation with Fixed and Floating Structures
Jun Xie | Jinan Xu | Qun Liu
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

An Approach of Hybrid Hierarchical Structure for Word Similarity Computing by HowNet
Jiangming Liu | Jinan Xu | Yujie Zhang
Proceedings of the Sixth International Joint Conference on Natural Language Processing

Co-authors

Kaiyu Huang (黄锴宇) 11

Songming Zhang 11

Yuanmeng Chen (陈圆梦) 7

Zhibo Man (满志博) 7

Changhao Guan 5

Rui Qi (齐睿) 5

Deyi Xiong (德意熊) 5

You Li (李铀) 4

Hongxiao Zhang 4

Jinchao Zhang 4

Xinyue Lou (娄馨月) 3

Kazushige Ouchi 3

Xiangyu Shi (石响宇) 3

Ouchi Kazushige 2

Jiangming Liu 2

Jingsheng Cai 1

Maosong Sun (孙茂松) 1

Xiaolong Wang 1

Xianfeng Zeng 1

Fengshuo Zhang 1

Wenxuan Zhang 1

Venues