XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding

Chan-Jan Hsu; Hung-Yi Lee; Yu Tsao

doi:10.18653/v1/2022.acl-short.52

XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding

Abstract

Transformer-based models are widely used in natural language understanding (NLU) tasks, and multimodal transformers have been effective in visual-language tasks. This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders. Our framework is inspired by cross-modal encoders’ success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU. After training with a small number of extra adapting steps and finetuned, the proposed XDBERT (cross-modal distilled BERT) outperforms pretrained-BERT in general language understanding evaluation (GLUE), situations with adversarial generations (SWAG) benchmarks, and readability benchmarks. We analyze the performance of XDBERT on GLUE to show that the improvement is likely visually grounded.

Anthology ID:: 2022.acl-short.52
Volume:: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 479–489
Language:
URL:: https://aclanthology.org/2022.acl-short.52
DOI:: 10.18653/v1/2022.acl-short.52
Bibkey:
Cite (ACL):: Chan-Jan Hsu, Hung-yi Lee, and Yu Tsao. 2022. XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 479–489, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding (Hsu et al., ACL 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.acl-short.52.pdf
Software:: 2022.acl-short.52.software.zip
Video:: https://aclanthology.org/2022.acl-short.52.mp4
Data: GLUE, SWAG

PDF Cite Search Software Video