ASTRA: Automatic Schema Matching using Machine Translation

Tarang Chugh, Deepak Zambre


Abstract
Many eCommerce platforms source product information from millions of sellers and manufactures, each having their own proprietary schemas, and employ schema matching solutions to structure it to enable informative shopping experiences. Meanwhile, state-of-the-art machine translation techniques have demonstrated great success in building context-aware representations that generalize well to new languages with minimal training data. In this work, we propose modeling the schema matching problem as a neural machine translation task: given product context and an attribute-value pair from a source schema, the model predicts the corresponding attribute, if available, in the target schema. We utilize open-source seq2seq models, such as mT5 and mBART, fine-tuned on product attribute mappings to build a scalable schema matching framework. We demonstrate that our proposed approach achieves a significant performance boost (15% precision and 7% recall uplift) compared to the baseline system and can support new attributes with precision ≥ 95% using only five labeled samples per attribute.
Anthology ID:
2024.emnlp-industry.92
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2024
Address:
Miami, Florida, US
Editors:
Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1237–1244
Language:
URL:
https://aclanthology.org/2024.emnlp-industry.92
DOI:
Bibkey:
Cite (ACL):
Tarang Chugh and Deepak Zambre. 2024. ASTRA: Automatic Schema Matching using Machine Translation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1237–1244, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):
ASTRA: Automatic Schema Matching using Machine Translation (Chugh & Zambre, EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-industry.92.pdf