Jiun-Hung Chen
2020
MIND: A Large-scale Dataset for News Recommendation
Fangzhao Wu | Ying Qiao | Jiun-Hung Chen | Chuhan Wu | Tao Qi | Jianxun Lian | Danyang Liu | Xing Xie | Jianfeng Gao | Winnie Wu | Ming Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Fangzhao Wu | Ying Qiao | Jiun-Hung Chen | Chuhan Wu | Tao Qi | Jianxun Lian | Danyang Liu | Xing Xie | Jianfeng Gao | Winnie Wu | Ming Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
News recommendation is an important technique for personalized news service. Compared with product and movie recommendations which have been comprehensively studied, the research on news recommendation is much more limited, mainly due to the lack of a high-quality benchmark dataset. In this paper, we present a large-scale dataset named MIND for news recommendation. Constructed from the user click logs of Microsoft News, MIND contains 1 million users and more than 160k English news articles, each of which has rich textual content such as title, abstract and body. We demonstrate MIND a good testbed for news recommendation through a comparative study of several state-of-the-art news recommendation methods which are originally developed on different proprietary datasets. Our results show the performance of news recommendation highly relies on the quality of news content understanding and user interest modeling. Many natural language processing techniques such as effective text representation methods and pre-trained language models can effectively improve the performance of news recommendation. The MIND dataset will be available at https://msnews.github.io.
XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation
Yaobo Liang | Nan Duan | Yeyun Gong | Ning Wu | Fenfei Guo | Weizhen Qi | Ming Gong | Linjun Shou | Daxin Jiang | Guihong Cao | Xiaodong Fan | Ruofei Zhang | Rahul Agrawal | Edward Cui | Sining Wei | Taroon Bharti | Ying Qiao | Jiun-Hung Chen | Winnie Wu | Shuguang Liu | Fan Yang | Daniel Campos | Rangan Majumder | Ming Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Yaobo Liang | Nan Duan | Yeyun Gong | Ning Wu | Fenfei Guo | Weizhen Qi | Ming Gong | Linjun Shou | Daxin Jiang | Guihong Cao | Xiaodong Fan | Ruofei Zhang | Rahul Agrawal | Edward Cui | Sining Wei | Taroon Bharti | Ying Qiao | Jiun-Hung Chen | Winnie Wu | Shuguang Liu | Fan Yang | Daniel Campos | Rangan Majumder | Ming Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
In this paper, we introduce XGLUE, a new benchmark dataset to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora, and evaluate their performance across a diverse set of cross-lingual tasks. Comparing to GLUE (Wang et al.,2019), which is labeled in English and includes natural language understanding tasks only, XGLUE has three main advantages: (1) it provides two corpora with different sizes for cross-lingual pre-training; (2) it provides 11 diversified tasks that cover both natural language understanding and generation scenarios; (3) for each task, it provides labeled data in multiple languages. We extend a recent cross-lingual pre-trained model Unicoder (Huang et al., 2019) to cover both understanding and generation tasks, which is evaluated on XGLUE as a strong baseline. We also evaluate the base versions (12-layer) of Multilingual BERT, XLM and XLM-R for comparison.
Search
Fix author
Co-authors
- Ying Qiao 2
- Winnie Wu 2
- Ming Zhou 2
- Rahul Agrawal 1
- Taroon Bharti 1
- Daniel Campos 1
- Guihong Cao 1
- Edward Cui 1
- Nan Duan 1
- Xiaodong Fan 1
- Jianfeng Gao 1
- Yeyun Gong 1
- Ming Gong 1
- Fenfei Guo 1
- Daxin Jiang 1
- Jianxun Lian 1
- Yaobo Liang 1
- Danyang Liu 1
- Shuguang Liu 1
- Rangan Majumder 1
- Tao Qi 1
- Weizhen Qi 1
- Linjun Shou 1
- Sining Wei 1
- Fangzhao Wu 1
- Chuhan Wu 1
- Ning Wu 1
- Xing Xie 1
- Fan Yang 1
- Ruofei Zhang 1