Xinxin Zhang

2025

pdf bib abs
Adversarial Alignment with Anchor Dragging Drift (A³D²): Multimodal Domain Adaptation with Partially Shifted Modalities
Jun Sun | Xinxin Zhang | Simin Hong | Jian Zhu | Lingfang Zeng
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Multimodal learning has celebrated remarkable success across diverse areas, yet faces the challenge of prohibitively expensive data collection and annotation when adapting models to new environments. In this context, domain adaptation has gained growing popularity as a technique for knowledge transfer, which, however, remains underexplored in multimodal settings compared with unimodal ones. This paper investigates multimodal domain adaptation, focusing on a practical partially shifting scenario where some modalities (referred to as anchors) remain domain-stable, while others (referred to as drifts) undergo a domain shift. We propose a bi-alignment scheme to simultaneously perform drift-drift and anchor-drift matching. The former is achieved through adversarial learning, aligning the representations of the drifts across source and target domains; the latter corresponds to an “anchor dragging drift” strategy, which matches the distributions of the drifts and anchors within the target domain using the optimal transport (OT) method. The overall design principle features Adversarial Alignment with Anchor Dragging Drift, abbreviated as A³D², for multimodal domain adaptation with partially shifted modalities. Comprehensive empirical results verify the effectiveness of the proposed approach, and demonstrate that A³D² achieves superior performance compared with state-of-the-art approaches. The code is available at: https://github.com/sunjunaimer/A3D2.git.

2024

pdf bib abs
Amanda: Adaptively Modality-Balanced Domain Adaptation for Multimodal Emotion Recognition
Xinxin Zhang | Jun Sun | Simin Hong | Taihao Li
Findings of the Association for Computational Linguistics: ACL 2024

This paper investigates unsupervised multimodal domain adaptation for multimodal emotion recognition, which is a solution for data scarcity yet remains under studied. Due to the varying distribution discrepancies of different modalities between source and target domains, the primary challenge lies in how to balance the domain alignment across modalities to guarantee they are all well aligned. To achieve this, we first develop our model based on the information bottleneck theory to learn optimal representation for each modality independently. Then, we align the domains via matching the label distributions and the representations. In order to balance the representation alignment, we propose to minimize a surrogate of the alignment losses, which is equivalent to adaptively adjusting the weights of the modalities throughout training, thus achieving balanced domain alignment across modalities. Overall, the proposed approach features Adaptively modality-balanced domain adaptation, dubbed Amanda, for multimodal emotion recognition. Extensive empirical results on commonly used benchmark datasets demonstrate that Amanda significantly outperforms competing approaches. The code is available at https://github.com/sunjunaimer/Amanda.

2020

pdf bib abs
WAE_RN: Integrating Wasserstein Autoencoder and Relational Network for Text Sequence
Xinxin Zhang | Xiaoming Liu | Guan Yang | Fangfang Li
Proceedings of the 19th Chinese National Conference on Computational Linguistics

One challenge in Natural Language Processing (NLP) area is to learn semantic representation in different contexts. Recent works on pre-trained language model have received great attentions and have been proven as an effective technique. In spite of the success of pre-trained language model in many NLP tasks, the learned text representation only contains the correlation among the words in the sentence itself and ignores the implicit relationship between arbitrary tokens in the sequence. To address this problem, we focus on how to make our model effectively learn word representations that contain the relational information between any tokens of text sequences. In this paper, we propose to integrate the relational network(RN) into a Wasserstein autoencoder(WAE). Specifically, WAE and RN are used to better keep the semantic structurse and capture the relational information, respectively. Extensive experiments demonstrate that our proposed model achieves significant improvements over the traditional Seq2Seq baselines.

Co-authors

Guan Yang 1

Lingfang Zeng 1

Jian Zhu 1

Venues

Fix author