pdf bib Transformer-Exclusive Cross-Modal Representation for Vision and LanguageAndrew Shin | Takuya NarihiraFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021