Linqing Chen


2022

pdf bib
生成模型在层次结构极限多标签文本分类中的应用(Generation Model for Hierarchical Extreme Multi-label Text Classification)
Linqing Chen (陈林卿) | Dawang He (何大望) | Yansi Xiao (肖燕思) | Yilin Liu (刘依林) | Jianping Lu (陆剑平) | Weilei Wang (王为磊)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“层次结构极限多标签文本分类是自然语言处理研究领域中一个重要而又具有挑战性的课题。该任务类别标签数量巨大且自成体系,标签与标签之间还具有不同层级间的依赖关系或同层次间的相关性,这些特性进一步增加了任务难度。该文提出将层次结构极限多标签文本分类任务视为序列转换问题,将输出标签视为序列,从而可以直接从数十万标签中生成与文本相关的类别标签。通过软约束机制和词表复合映射在解码过程中利用标签之间的层次结构与相关信息。实验结果表明,该文提出的方法与基线模型相比取得了有意义的性能提升。进一步分析表明,该方法不仅可以捕获利用不同层级标签之间的上下位关系,还对极限多标签体系自身携带的噪声具有一定容错能力。”

2021

pdf bib
Breaking the Corpus Bottleneck for Context-Aware Neural Machine Translation with Cross-Task Pre-training
Linqing Chen | Junhui Li | Zhengxian Gong | Boxing Chen | Weihua Luo | Min Zhang | Guodong Zhou
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Context-aware neural machine translation (NMT) remains challenging due to the lack of large-scale document-level parallel corpora. To break the corpus bottleneck, in this paper we aim to improve context-aware NMT by taking the advantage of the availability of both large-scale sentence-level parallel dataset and source-side monolingual documents. To this end, we propose two pre-training tasks. One learns to translate a sentence from source language to target language on the sentence-level parallel dataset while the other learns to translate a document from deliberately noised to original on the monolingual documents. Importantly, the two pre-training tasks are jointly and simultaneously learned via the same model, thereafter fine-tuned on scale-limited parallel documents from both sentence-level and document-level perspectives. Experimental results on four translation tasks show that our approach significantly improves translation performance. One nice property of our approach is that the fine-tuned model can be used to translate both sentences and documents.

2020

pdf bib
层次化结构全局上下文增强的篇章级神经机器翻译(Hierarchical Global Context Augmented Document-level Neural Machine Translation)
Linqing Chen (陈林卿) | Junhui Li (李军辉) | Zhengxian Gong (贡正仙)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

如何有效利用篇章上下文信息一直是篇章级神经机器翻译研究领域的一大挑战。本文提出利用来源于整个篇章的层次化全局上下文提高篇章级神经机器翻译性能。为了实现该目标,本文模型分别获取当前句内单词与篇章内所有句子及单词之间的依赖关系,结合不同层次的依赖关系以获取含有层次化篇章信息的全局上下文。最终源语言当前句子中的每个单词都能获取其独有的综合词和句级别依赖关系的上下文。为了充分利用平行句对语料在训练中的优势本文使用两步训练法,在句子级语料训练模型的基础上使用含有篇章信息的语料进行二次训练以获得捕获全局上下文的能力。在若干基准语料数据集上的实验表明本文提出的模型与若干强基准模型相比取得了有意义的翻译质量提升。实验进一步表明,结合层次化篇章信息的上下文比仅使用词级别上下文更具优势。除此之外,本文尝试通过不同方式将全局上下文与翻译模型结合并观察其对模型性能的影响,并初步探究篇章翻译中全局上下文在篇章中的分布情况。