UTC-IE: A Unified Token-pair Classification Architecture for Information Extraction

Hang Yan, Yu Sun, Xiaonan Li, Yunhua Zhou, Xuanjing Huang, Xipeng Qiu


Abstract
Information Extraction (IE) spans several tasks with different output structures, such as named entity recognition, relation extraction and event extraction. Previously, those tasks were solved with different models because of diverse task output structures. Through re-examining IE tasks, we find that all of them can be interpreted as extracting spans and span relations. They can further be decomposed into token-pair classification tasks by using the start and end token of a span to pinpoint the span, and using the start-to-start and end-to-end token pairs of two spans to determine the relation. Based on the reformulation, we propose a Unified Token-pair Classification architecture for Information Extraction (UTC-IE), where we introduce Plusformer on top of the token-pair feature matrix. Specifically, it models axis-aware interaction with plus-shaped self-attention and local interaction with Convolutional Neural Network over token pairs. Experiments show that our approach outperforms task-specific and unified models on all tasks in 10 datasets, and achieves better or comparable results on 2 joint IE datasets. Moreover, UTC-IE speeds up over state-of-the-art models on IE tasks significantly in most datasets, which verifies the effectiveness of our architecture.
Anthology ID:
2023.acl-long.226
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4096–4122
Language:
URL:
https://aclanthology.org/2023.acl-long.226
DOI:
10.18653/v1/2023.acl-long.226
Bibkey:
Cite (ACL):
Hang Yan, Yu Sun, Xiaonan Li, Yunhua Zhou, Xuanjing Huang, and Xipeng Qiu. 2023. UTC-IE: A Unified Token-pair Classification Architecture for Information Extraction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4096–4122, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
UTC-IE: A Unified Token-pair Classification Architecture for Information Extraction (Yan et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.226.pdf
Video:
 https://aclanthology.org/2023.acl-long.226.mp4