A Hybrid Approach to Cross-lingual Product Review Summarization

Saleh Soltan, Victor Soto, Ke Tran, Wael Hamza


Abstract
We present a hybrid approach for product review summarization which consists of: (i) an unsupervised extractive step to extract the most important sentences out of all the reviews, and (ii) a supervised abstractive step to summarize the extracted sentences into a coherent short summary. This approach allows us to develop an efficient cross-lingual abstractive summarizer that can generate summaries in any language, given the extracted sentences out of thousands of reviews in a source language. In order to train and test the abstractive model, we create the Cross-lingual Amazon Reviews Summarization (CARS) dataset which provides English summaries for training, and English, French, Italian, Arabic, and Hindi summaries for testing based on selected English reviews. We show that the summaries generated by our model are as good as human written summaries in coherence, informativeness, non-redundancy, and fluency.
Anthology ID:
2022.emnlp-industry.3
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
December
Year:
2022
Address:
Abu Dhabi, UAE
Editors:
Yunyao Li, Angeliki Lazaridou
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18–28
Language:
URL:
https://aclanthology.org/2022.emnlp-industry.3
DOI:
10.18653/v1/2022.emnlp-industry.3
Bibkey:
Cite (ACL):
Saleh Soltan, Victor Soto, Ke Tran, and Wael Hamza. 2022. A Hybrid Approach to Cross-lingual Product Review Summarization. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 18–28, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
A Hybrid Approach to Cross-lingual Product Review Summarization (Soltan et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-industry.3.pdf