Jing Yi


2022

pdf bib
Knowledge Inheritance for Pre-trained Language Models
Yujia Qin | Yankai Lin | Jing Yi | Jiajie Zhang | Xu Han | Zhengyan Zhang | Yusheng Su | Zhiyuan Liu | Peng Li | Maosong Sun | Jie Zhou
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recent explorations of large-scale pre-trained language models (PLMs) have revealed the power of PLMs with huge amounts of parameters, setting off a wave of training ever-larger PLMs. However, it requires tremendous computational resources to train a large-scale PLM, which may be practically unaffordable. In addition, existing large-scale PLMs are mainly trained from scratch individually, ignoring that many well-trained PLMs are available. To this end, we explore the question how could existing PLMs benefit training large-scale PLMs in future. Specifically, we introduce a pre-training framework named “knowledge inheritance” (KI) and explore how could knowledge distillation serve as auxiliary supervision during pre-training to efficiently learn larger PLMs. Experimental results demonstrate the superiority of KI in training efficiency. We also conduct empirical analyses to explore the effects of teacher PLMs’ pre-training settings, including model architecture, pre-training data, etc. Finally, we show that KI could be applied to domain adaptation and knowledge transfer.

pdf bib
QuoteR: A Benchmark of Quote Recommendation for Writing
Fanchao Qi | Yanhui Yang | Jing Yi | Zhili Cheng | Zhiyuan Liu | Maosong Sun
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

It is very common to use quotations (quotes) to make our writings more elegant or convincing. To help people find appropriate quotes efficiently, the task of quote recommendation is presented, aiming to recommend quotes that fit the current context of writing. There have been various quote recommendation approaches, but they are evaluated on different unpublished datasets. To facilitate the research on this task, we build a large and fully open quote recommendation dataset called QuoteR, which comprises three parts including English, standard Chinese and classical Chinese. Any part of it is larger than previous unpublished counterparts. We conduct an extensive evaluation of existing quote recommendation methods on QuoteR. Furthermore, we propose a new quote recommendation model that significantly outperforms previous methods on all three parts of QuoteR. All the code and data of this paper can be obtained at https://github.com/thunlp/QuoteR.

2014

pdf bib
An Introduction to BLCU Personal Attributes Extraction System
Dong Yu | Cheng Yu | Qin Qu | Gongbo Tang | Chunhua Liu | Yue Tian | Jing Yi
Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing