Extreme Multi-Label Classification with Label Masking for Product Attribute Value Extraction

Wei-Te Chen, Yandi Xia, Keiji Shinzato


Abstract
Although most studies have treated attribute value extraction (AVE) as named entity recognition, these approaches are not practical in real-world e-commerce platforms because they perform poorly, and require canonicalization of extracted values. Furthermore, since values needed for actual services is static in many attributes, extraction of new values is not always necessary. Given the above, we formalize AVE as extreme multi-label classification (XMC). A major problem in solving AVE as XMC is that the distribution between positive and negative labels for products is heavily imbalanced. To mitigate the negative impact derived from such biased distribution, we propose label masking, a simple and effective method to reduce the number of negative labels in training. We exploit attribute taxonomy designed for e-commerce platforms to determine which labels are negative for products. Experimental results using a dataset collected from a Japanese e-commerce platform demonstrate that the label masking improves micro and macro F1 scores by 3.38 and 23.20 points, respectively.
Anthology ID:
2022.ecnlp-1.16
Volume:
Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venues:
ACL | ECNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
134–140
Language:
URL:
https://aclanthology.org/2022.ecnlp-1.16
DOI:
10.18653/v1/2022.ecnlp-1.16
Bibkey:
Cite (ACL):
Wei-Te Chen, Yandi Xia, and Keiji Shinzato. 2022. Extreme Multi-Label Classification with Label Masking for Product Attribute Value Extraction. In Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5), pages 134–140, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Extreme Multi-Label Classification with Label Masking for Product Attribute Value Extraction (Chen et al., ECNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.ecnlp-1.16.pdf