Who wrote this book? A challenge for e-commerce

Béranger Dumont, Simona Maggio, Ghiles Sidi Said, Quoc-Tien Au


Abstract
Modern e-commerce catalogs contain millions of references, associated with textual and visual information that is of paramount importance for the products to be found via search or browsing. Of particular significance is the book category, where the author name(s) field poses a significant challenge. Indeed, books written by a given author might be listed with different authors’ names due to abbreviations, spelling variants and mistakes, among others. To solve this problem at scale, we design a composite system involving open data sources for books, as well as deep learning components, such as approximate match with Siamese networks and name correction with sequence-to-sequence networks. We evaluate this approach on product data from the e-commerce website Rakuten France, and find that the top proposal of the system is the normalized author name with 72% accuracy.
Anthology ID:
D19-5516
Volume:
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
Venue:
WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
121–125
Language:
URL:
https://aclanthology.org/D19-5516
DOI:
10.18653/v1/D19-5516
Bibkey:
Cite (ACL):
Béranger Dumont, Simona Maggio, Ghiles Sidi Said, and Quoc-Tien Au. 2019. Who wrote this book? A challenge for e-commerce. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pages 121–125, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Who wrote this book? A challenge for e-commerce (Dumont et al., WNUT 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-5516.pdf