Knowledge-Selective Pretraining for Attribute Value Extraction

Hui Liu, Qingyu Yin, Zhengyang Wang, Chenwei Zhang, Haoming Jiang, Yifan Gao, Zheng Li, Xian Li, Chao Zhang, Bing Yin, William Wang, Xiaodan Zhu


Abstract
Attribute Value Extraction (AVE) aims to retrieve the values of attributes from the product profiles. The state-of-the-art methods tackle the AVE task through a question-answering (QA) paradigm, where the value is predicted from the context (i.e. product profile) given a query (i.e. attributes). Despite of the substantial advancements that have been made, the performance of existing methods on rare attributes is still far from satisfaction, and they cannot be easily extended to unseen attributes due to the poor generalization ability. In this work, we propose to leverage pretraining and transfer learning to address the aforementioned weaknesses. We first collect the product information from various E-commerce stores and retrieve a large number of (profile, attribute, value) triples, which will be used as the pretraining corpus. To more effectively utilize the retrieved corpus, we further design a Knowledge-Selective Framework (KSelF) based on query expansion that can be closely combined with the pretraining corpus to boost the performance. Meanwhile, considering the public AE-pub dataset contains considerable noise, we construct and contribute a larger benchmark EC-AVE collected from E-commerce websites. We conduct evaluation on both of these datasets. The experimental results demonstrate that our proposed KSelF achieves new state-of-the-art performance without pretraining. When incorporated with the pretraining corpus, the performance of KSelF can be further improved, particularly on the attributes with limited training resources.
Anthology ID:
2023.findings-emnlp.542
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8062–8074
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.542
DOI:
10.18653/v1/2023.findings-emnlp.542
Bibkey:
Cite (ACL):
Hui Liu, Qingyu Yin, Zhengyang Wang, Chenwei Zhang, Haoming Jiang, Yifan Gao, Zheng Li, Xian Li, Chao Zhang, Bing Yin, William Wang, and Xiaodan Zhu. 2023. Knowledge-Selective Pretraining for Attribute Value Extraction. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 8062–8074, Singapore. Association for Computational Linguistics.
Cite (Informal):
Knowledge-Selective Pretraining for Attribute Value Extraction (Liu et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.542.pdf