Revisiting Multimodal Transformers for Tabular Data with Text Fields

Thomas Bonnier


Abstract
Tabular data with text fields can be leveraged in applications such as financial risk assessment or medical diagnosis prediction. When employing multimodal approaches to make predictions based on these modalities, it is crucial to make the most appropriate modeling choices in terms of numerical feature encoding or fusion strategy. In this paper, we focus on multimodal classification tasks based on tabular datasets with text fields. We build on multimodal Transformers to propose the Tabular-Text Transformer (TTT), a tabular/text dual-stream Transformer network. This architecture includes a distance-to-quantile embedding scheme for numerical features and an overall attention module which concurrently considers self-attention and cross-modal attention. Further, we leverage the two well-informed modality streams to estimate whether a prediction is uncertain or not. To explain uncertainty in terms of feature values, we use a sampling-based approximation of Shapley values in a bimodal context, with two options for the value function. To show the efficacy and relevance of this approach, we compare it to six baselines and measure its ability to quantify and explain uncertainty against various methods. Our code is available at https://github.com/thomas-bonnier/TabularTextTransformer.
Anthology ID:
2024.findings-acl.87
Volume:
Findings of the Association for Computational Linguistics: ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1481–1500
Language:
URL:
https://aclanthology.org/2024.findings-acl.87
DOI:
10.18653/v1/2024.findings-acl.87
Bibkey:
Cite (ACL):
Thomas Bonnier. 2024. Revisiting Multimodal Transformers for Tabular Data with Text Fields. In Findings of the Association for Computational Linguistics: ACL 2024, pages 1481–1500, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Revisiting Multimodal Transformers for Tabular Data with Text Fields (Bonnier, Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.87.pdf