Yilin Liu


2024

pdf bib
TRUE-UIE: Two Universal Relations Unify Information Extraction Tasks
Yucheng Wang | Bowen Yu | Yilin Liu | Shudong Lu
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Information extraction (IE) encounters challenges due to the variety of schemas and objectives that differ across tasks. Recent advancements hint at the potential for universal approaches to model such tasks, referred to as Universal Information Extraction (UIE). While handling diverse tasks in one model, their generalization is limited since they are actually learning task-specific knowledge.In this study, we introduce an innovative paradigm known as TRUE-UIE, wherein all IE tasks are aligned to learn the same goals: extracting mention spans and two universal relations named \mathtt{NEXT} and \mathtt{IS}. During the decoding process, the \mathtt{NEXT} relation is utilized to group related elements, while the \mathtt{IS} relation, in conjunction with structured language prompts, undertakes the role of type recognition. Additionally, we consider the sequential dependency of tokens during span extraction, an aspect often overlooked in prevalent models.Our empirical experiments indicate that TRUE-UIE achieves state-of-the-art performance on established benchmarks encompassing 16 datasets, spanning 7 diverse IE tasks. Further evaluations reveal that our approach effectively share knowledge between different IE tasks, showcasing significant transferability in zero-shot and few-shot scenarios.

2022

pdf bib
生成模型在层次结构极限多标签文本分类中的应用(Generation Model for Hierarchical Extreme Multi-label Text Classification)
Linqing Chen (陈林卿) | Dawang He (何大望) | Yansi Xiao (肖燕思) | Yilin Liu (刘依林) | Jianping Lu (陆剑平) | Weilei Wang (王为磊)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“层次结构极限多标签文本分类是自然语言处理研究领域中一个重要而又具有挑战性的课题。该任务类别标签数量巨大且自成体系,标签与标签之间还具有不同层级间的依赖关系或同层次间的相关性,这些特性进一步增加了任务难度。该文提出将层次结构极限多标签文本分类任务视为序列转换问题,将输出标签视为序列,从而可以直接从数十万标签中生成与文本相关的类别标签。通过软约束机制和词表复合映射在解码过程中利用标签之间的层次结构与相关信息。实验结果表明,该文提出的方法与基线模型相比取得了有意义的性能提升。进一步分析表明,该方法不仅可以捕获利用不同层级标签之间的上下位关系,还对极限多标签体系自身携带的噪声具有一定容错能力。”