Songlin Wang
2024
Breaking the Hourglass Phenomenon of Residual Quantization: Enhancing the Upper Bound of Generative Retrieval
Zhirui Kuai
|
Zuxu Chen
|
Huimu Wang
|
Mingming Li
|
Dadong Miao
|
Wang Binbin
|
Xusong Chen
|
Li Kuang
|
Yuxing Han
|
Jiaxing Wang
|
Guoyu Tang
|
Lin Liu
|
Songlin Wang
|
Jingwei Zhuo
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Generative retrieval (GR) has emerged as a transformative paradigm in search and recommender systems, leveraging numeric-based identifier representations to enhance efficiency and generalization. Notably, methods like TIGER, which employ Residual Quantization-based Semantic Identifiers (RQ-SID), have shown significant promise in e-commerce scenarios by effectively managing item IDs. However, a critical issue termed the "Hourglass" phenomenon, occurs in RQ-SID, where intermediate codebook tokens become overly concentrated, hindering the full utilization of generative retrieval methods. This paper analyses and addresses this problem by identifying data sparsity and long-tailed distribution as the primary causes. Through comprehensive experiments and detailed ablation studies, we analyze the impact of these factors on codebook utilization and data distribution. Our findings reveal that the “Hourglass” phenomenon substantially impacts the performance of RQ-SID in generative retrieval. We propose effective solutions to mitigate this issue, thereby significantly enhancing the effectiveness of generative retrieval in real-world E-commerce applications.
2022
ZhichunRoad at SemEval-2022 Task 2: Adversarial Training and Contrastive Learning for Multiword Representations
Xuange Cui
|
Wei Xiong
|
Songlin Wang
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
This paper presents our contribution to the SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding.We explore the impact of three different pre-trained multilingual language models in the SubTaskA.By enhancing the model generalization and robustness, we use the exponential moving average (EMA) method and the adversarial attack strategy. In SubTaskB, we add an effective cross-attention module for modeling the relationships of two sentences. We jointly train the model with a contrastive learning objective and employ a momentum contrast to enlarge the number of negative pairs. Additionally, we use the alignment and uniformity properties to measure the quality of sentence embeddings.Our approach obtained competitive results in both subtasks.