Product Answer Generation from Heterogeneous Sources: A New Benchmark and Best Practices

Xiaoyu Shen; Gianni Barlacchi; Marco Del Tredici; Weiwei Cheng; Bill Byrne; Adrià de Gispert

doi:10.18653/v1/2022.ecnlp-1.13

Product Answer Generation from Heterogeneous Sources: A New Benchmark and Best Practices

Xiaoyu Shen, Gianni Barlacchi, Marco Del Tredici, Weiwei Cheng, Bill Byrne, Adrià Gispert

Abstract

It is of great value to answer product questions based on heterogeneous information sources available on web product pages, e.g., semi-structured attributes, text descriptions, user-provided contents, etc. However, these sources have different structures and writing styles, which poses challenges for (1) evidence ranking, (2) source selection, and (3) answer generation. In this paper, we build a benchmark with annotations for both evidence selection and answer generation covering 6 information sources. Based on this benchmark, we conduct a comprehensive study and present a set of best practices. We show that all sources are important and contribute to answering questions. Handling all sources within one single model can produce comparable confidence scores across sources and combining multiple sources for training always helps, even for sources with totally different structures. We further propose a novel data augmentation method to iteratively create training samples for answer generation, which achieves close-to-human performance with only a few thousandannotations. Finally, we perform an in-depth error analysis of model predictions and highlight the challenges for future research.

Anthology ID:: 2022.ecnlp-1.13
Volume:: Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5)
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Shervin Malmasi, Oleg Rokhlenko, Nicola Ueffing, Ido Guy, Eugene Agichtein, Surya Kallumadi
Venue:: ECNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 99–110
Language:
URL:: https://aclanthology.org/2022.ecnlp-1.13
DOI:: 10.18653/v1/2022.ecnlp-1.13
Bibkey:
Cite (ACL):: Xiaoyu Shen, Gianni Barlacchi, Marco Del Tredici, Weiwei Cheng, Bill Byrne, and Adrià Gispert. 2022. Product Answer Generation from Heterogeneous Sources: A New Benchmark and Best Practices. In Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5), pages 99–110, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Product Answer Generation from Heterogeneous Sources: A New Benchmark and Best Practices (Shen et al., ECNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.ecnlp-1.13.pdf
Video:: https://aclanthology.org/2022.ecnlp-1.13.mp4
Data: AmazonQA

PDF Cite Search Video