%0 Conference Proceedings %T SMARTAVE: Structured Multimodal Transformer for Product Attribute Value Extraction %A Wang, Qifan %A Yang, Li %A Wang, Jingang %A Krishnan, Jitin %A Dai, Bo %A Wang, Sinong %A Xu, Zenglin %A Khabsa, Madian %A Ma, Hao %Y Goldberg, Yoav %Y Kozareva, Zornitsa %Y Zhang, Yue %S Findings of the Association for Computational Linguistics: EMNLP 2022 %D 2022 %8 December %I Association for Computational Linguistics %C Abu Dhabi, United Arab Emirates %F wang-etal-2022-smartave %X Automatic product attribute value extraction refers to the task of identifying values of an attribute from the product information. Product attributes are essential in improving online shopping experience for customers. Most existing methods focus on extracting attribute values from product title and description. However, in many real-world applications, a product is usually represented by multiple modalities beyond title and description, such as product specifications, text and visual information from the product image, etc. In this paper, we propose SMARTAVE, a Structure Mltimodal trAnsformeR for producT Attribute Value Extraction, which jointly encodes the structured product information from multiple modalities. Specifically, in SMARTAVE encoder, we introduce hyper-tokens to represent the modality-level information, and local-tokens to represent the original text and visual inputs. Structured attention patterns are designed among the hyper-tokens and local-tokens for learning effective product representation. The attribute values are then extracted based on the learned embeddings. We conduct extensive experiments on two multimodal product datasets. Experimental results demonstrate the superior performance of the proposed approach over several state-of-the-art methods. Ablation studies validate the effectiveness of the structured attentions in modeling the multimodal product information. %R 10.18653/v1/2022.findings-emnlp.20 %U https://aclanthology.org/2022.findings-emnlp.20 %U https://doi.org/10.18653/v1/2022.findings-emnlp.20 %P 263-276