DASA-Trans-STM: Adaptive Efficient Transformer for Short Text Matching using Data Augmentation and Semantic Awareness

Jiguo Liu; Chao Liu; Meimei Li; Nan Li (李楠); Shihao Gao; Dali Zhu

doi:10.18653/v1/2025.emnlp-main.228

DASA-Trans-STM: Adaptive Efficient Transformer for Short Text Matching using Data Augmentation and Semantic Awareness

Jiguo Liu, Chao Liu, Meimei Li, Nan Li, Shihao Gao, Dali Zhu

Abstract

Rencent advancements in large language models (LLM) have shown impressive versatility across various tasks. Short text matching is one of the fundamental technologies in natural language processing. In previous studies, the common approach to applying them to Chinese is segmenting each sentence into words, and then taking these words as input. However, existing approaches have three limitations: 1) Some Chinese words are polysemous, and semantic information is not fully utilized. 2) Some models suffer potential issues caused by word segmentation and incorrect recognition of negative words affects the semantic understanding of the whole sentence. 3) Fuzzy negation words in ancient Chinese are difficult to recognize and match. In this work, we propose a novel adaptive Transformer for Chinese short text matching using Data Augmentation and Semantic Awareness (DASA), which can fully mine the information expressed in Chinese text to deal with word ambiguity. DASA is based on a Graph Attention Transformer Encoder that takes two word lattice graphs as input and integrates sense information from N-HowNet to moderate word ambiguity. Specially, we use an LLM to generate similar sentences for the optimal text representation. Experimental results show that the augmentation done using DASA can considerably boost the performance of our system and achieve significantly better results than previous state-of-the-art methods on four available datasets, namely MNS, LCQMC, AFQMC, and BQ.

Anthology ID:: 2025.emnlp-main.228
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4594–4610
Language:
URL:: https://aclanthology.org/2025.emnlp-main.228/
DOI:: 10.18653/v1/2025.emnlp-main.228
Bibkey:
Cite (ACL):: Jiguo Liu, Chao Liu, Meimei Li, Nan Li, Shihao Gao, and Dali Zhu. 2025. DASA-Trans-STM: Adaptive Efficient Transformer for Short Text Matching using Data Augmentation and Semantic Awareness. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 4594–4610, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: DASA-Trans-STM: Adaptive Efficient Transformer for Short Text Matching using Data Augmentation and Semantic Awareness (Liu et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.228.pdf
Checklist:: 2025.emnlp-main.228.checklist.pdf

PDF Cite Search Checklist Fix data