Yang Song


2021

pdf bib
Extremely Small BERT Models from Mixed-Vocabulary Training
Sanqiang Zhao | Raghav Gupta | Yang Song | Denny Zhou
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Pretrained language models like BERT have achieved good results on NLP tasks, but are impractical on resource-limited devices due to memory footprint. A large fraction of this footprint comes from the input embeddings with large input vocabulary and embedding dimensions. Existing knowledge distillation methods used for model compression cannot be directly applied to train student models with reduced vocabulary sizes. To this end, we propose a distillation method to align the teacher and student embeddings via mixed-vocabulary training. Our method compresses BERT-LARGE to a task-agnostic model with smaller vocabulary and hidden dimensions, which is an order of magnitude smaller than other distilled BERT models and offers a better size-accuracy trade-off on language understanding benchmarks as well as a practical dialogue task.

pdf bib
PhotoChat: A Human-Human Dialogue Dataset With Photo Sharing Behavior For Joint Image-Text Modeling
Xiaoxue Zang | Lijuan Liu | Maria Wang | Yang Song | Hao Zhang | Jindong Chen
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We present a new human-human dialogue dataset - PhotoChat, the first dataset that casts light on the photo sharing behavior in online messaging. PhotoChat contains 12k dialogues, each of which is paired with a user photo that is shared during the conversation. Based on this dataset, we propose two tasks to facilitate research on image-text modeling: a photo-sharing intent prediction task that predicts whether one intends to share a photo in the next conversation turn, and a photo retrieval task that retrieves the most relevant photo according to the dialogue context. In addition, for both tasks, we provide baseline models using the state-of-the-art models and report their benchmark performances. The best image retrieval model achieves 10.4% recall@1 (out of 1000 candidates) and the best photo intent prediction model achieves 58.1% F1 score, indicating that the dataset presents interesting yet challenging real-world problems. We are releasing PhotoChat to facilitate future research work among the community.

2019

pdf bib
Representation Learning with Ordered Relation Paths for Knowledge Graph Completion
Yao Zhu | Hongzhi Liu | Zhonghai Wu | Yang Song | Tao Zhang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Incompleteness is a common problem for existing knowledge graphs (KGs), and the completion of KG which aims to predict links between entities is challenging. Most existing KG completion methods only consider the direct relation between nodes and ignore the relation paths which contain useful information for link prediction. Recently, a few methods take relation paths into consideration but pay less attention to the order of relations in paths which is important for reasoning. In addition, these path-based models always ignore nonlinear contributions of path features for link prediction. To solve these problems, we propose a novel KG completion method named OPTransE. Instead of embedding both entities of a relation into the same latent space as in previous methods, we project the head entity and the tail entity of each relation into different spaces to guarantee the order of relations in the path. Meanwhile, we adopt a pooling strategy to extract nonlinear and complex features of different paths to further improve the performance of link prediction. Experimental results on two benchmark datasets show that the proposed model OPTransE performs better than state-of-the-art methods.

pdf bib
Domain Adaptation for Person-Job Fit with Transferable Deep Global Match Network
Shuqing Bian | Wayne Xin Zhao | Yang Song | Tao Zhang | Ji-Rong Wen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Person-job fit has been an important task which aims to automatically match job positions with suitable candidates. Previous methods mainly focus on solving the match task in single-domain setting, which may not work well when labeled data is limited. We study the domain adaptation problem for person-job fit. We first propose a deep global match network for capturing the global semantic interactions between two sentences from a job posting and a candidate resume respectively. Furthermore, we extend the match network and implement domain adaptation in three levels, sentence-level representation, sentence-level match, and global match. Extensive experiment results on a large real-world dataset consisting of six domains have demonstrated the effectiveness of the proposed model, especially when there is not sufficient labeled data.

pdf bib
Generating Long and Informative Reviews with Aspect-Aware Coarse-to-Fine Decoding
Junyi Li | Wayne Xin Zhao | Ji-Rong Wen | Yang Song
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Generating long and informative review text is a challenging natural language generation task. Previous work focuses on word-level generation, neglecting the importance of topical and syntactic characteristics from natural languages. In this paper, we propose a novel review generation model by characterizing an elaborately designed aspect-aware coarse-to-fine generation process. First, we model the aspect transitions to capture the overall content flow. Then, to generate a sentence, an aspect-aware sketch will be predicted using an aspect-aware decoder. Finally, another decoder fills in the semantic slots by generating corresponding words. Our approach is able to jointly utilize aspect semantics, syntactic sketch, and context information. Extensive experiments results have demonstrated the effectiveness of the proposed model.

2013

pdf bib
Efficient Collective Entity Linking with Stacking
Zhengyan He | Shujie Liu | Yang Song | Mu Li | Ming Zhou | Houfeng Wang
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2012

pdf bib
Joint Learning for Coreference Resolution with Markov Logic
Yang Song | Jing Jiang | Wayne Xin Zhao | Sujian Li | Houfeng Wang
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Identifying Event-related Bursts via Social Media Activities
Xin Zhao | Baihan Shu | Jing Jiang | Yang Song | Hongfei Yan | Xiaoming Li
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Link Type Based Pre-Cluster Pair Model for Coreference Resolution
Yang Song | Houfeng Wang | Jing Jiang
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
Topical Keyphrase Extraction from Twitter
Xin Zhao | Jing Jiang | Jing He | Yang Song | Palakorn Achanauparp | Ee-Peng Lim | Xiaoming Li
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
A Pipeline Approach to Chinese Personal Name Disambiguation
Yang Song | Zhengyan He | Chen Chen | Houfeng Wang
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Applying Spectral Clustering for Chinese Word Sense Induction
Zhengyan He | Yang Song | Houfeng Wang
CIPS-SIGHAN Joint Conference on Chinese Language Processing