Lin Yuan
2024
IEPile: Unearthing Large Scale Schema-Conditioned Information Extraction Corpus
Honghao Gui
|
Lin Yuan
|
Hongbin Ye
|
Ningyu Zhang
|
Mengshu Sun
|
Lei Liang
|
Huajun Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Large Language Models (LLMs) demonstrate remarkable potential across various domains; however, they exhibit a significant performance gap in Information Extraction (IE). Note that high-quality instruction data is the vital key for enhancing the specific capabilities of LLMs, while current IE datasets tend to be small in scale, fragmented, and lack standardized schema. To this end, we introduce IEPile, a comprehensive bilingual (English and Chinese) IE instruction corpus, which contains approximately 0.32B tokens. We construct IEPile by collecting and cleaning 33 existing IE datasets, and introduce schema-based instruction generation to unearth a large-scale corpus. Experimentally, IEPile enhance the performance of LLMs for IE, with notable improvements in zero-shot generalization. We open-source the resource and pre-trained models, hoping to provide valuable support to the NLP community.
2022
MCS: An In-battle Commentary System for MOBA Games
Xiaofeng Qi
|
Chao Li
|
Zhongping Liang
|
Jigang Liu
|
Cheng Zhang
|
Yuanxin Wei
|
Lin Yuan
|
Guang Yang
|
Lanxiao Huang
|
Min Li
Proceedings of the 29th International Conference on Computational Linguistics
This paper introduces a generative system for in-battle real-time commentary in mobile MOBA games. Event commentary is important for battles in MOBA games, which is applicable to a wide range of scenarios like live streaming, e-sports commentary and combat information analysis. The system takes real-time match statistics and events as input, and an effective transform method is designed to convert match statistics and utterances into consistent encoding space. This paper presents the general framework and implementation details of the proposed system, and provides experimental results on large-scale real-world match data.