Set Learning for Generative Information Extraction

Jiangnan Li, Yice Zhang, Bin Liang, Kam-Fai Wong, Ruifeng Xu


Abstract
Recent efforts have endeavored to employ the sequence-to-sequence (Seq2Seq) model in Information Extraction (IE) due to its potential to tackle multiple IE tasks in a unified manner. Under this formalization, multiple structured objects are concatenated as the target sequence in a predefined order. However, structured objects, by their nature, constitute an unordered set. Consequently, this formalization introduces a potential order bias, which can impair model learning. Targeting this issue, this paper proposes a set learning approach that considers multiple permutations of structured objects to optimize set probability approximately. Notably, our approach does not require any modifications to model structures, making it easily integrated into existing generative IE frameworks. Experiments show that our method consistently improves existing frameworks on vast tasks and datasets.
Anthology ID:
2023.emnlp-main.806
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13043–13052
Language:
URL:
https://aclanthology.org/2023.emnlp-main.806
DOI:
10.18653/v1/2023.emnlp-main.806
Bibkey:
Cite (ACL):
Jiangnan Li, Yice Zhang, Bin Liang, Kam-Fai Wong, and Ruifeng Xu. 2023. Set Learning for Generative Information Extraction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13043–13052, Singapore. Association for Computational Linguistics.
Cite (Informal):
Set Learning for Generative Information Extraction (Li et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.806.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.806.mp4