UniTranSeR: A Unified Transformer Semantic Representation Framework for Multimodal Task-Oriented Dialog System

Zhiyuan Ma, Jianjun Li, Guohui Li, Yongjing Cheng


Abstract
As a more natural and intelligent interaction manner, multimodal task-oriented dialog system recently has received great attention and many remarkable progresses have been achieved. Nevertheless, almost all existing studies follow the pipeline to first learn intra-modal features separately and then conduct simple feature concatenation or attention-based feature fusion to generate responses, which hampers them from learning inter-modal interactions and conducting cross-modal feature alignment for generating more intention-aware responses. To address these issues, we propose UniTranSeR, a Unified Transformer Semantic Representation framework with feature alignment and intention reasoning for multimodal dialog systems. Specifically, we first embed the multimodal features into a unified Transformer semantic space to prompt inter-modal interactions, and then devise a feature alignment and intention reasoning (FAIR) layer to perform cross-modal entity alignment and fine-grained key-value reasoning, so as to effectively identify user’s intention for generating more accurate responses. Experimental results verify the effectiveness of UniTranSeR, showing that it significantly outperforms state-of-the-art approaches on the representative MMD dataset.
Anthology ID:
2022.acl-long.9
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
103–114
Language:
URL:
https://aclanthology.org/2022.acl-long.9
DOI:
10.18653/v1/2022.acl-long.9
Bibkey:
Cite (ACL):
Zhiyuan Ma, Jianjun Li, Guohui Li, and Yongjing Cheng. 2022. UniTranSeR: A Unified Transformer Semantic Representation Framework for Multimodal Task-Oriented Dialog System. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 103–114, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
UniTranSeR: A Unified Transformer Semantic Representation Framework for Multimodal Task-Oriented Dialog System (Ma et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.9.pdf
Software:
 2022.acl-long.9.software.zip
Data
MMD