Puzhen Su

Also published as: PuZhen Su


2024

pdf bib
SPZ: A Semantic Perturbation-based Data Augmentation Method with Zonal-Mixing for Alzheimer’s Disease Detection
FangFang Li | Cheng Huang | PuZhen Su | Jie Yin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Alzheimer’s Disease (AD), characterized by significant cognitive and functional impairment, necessitates the development of early detection techniques. Traditional diagnostic practices, such as cognitive assessments and biomarker analysis, are often invasive and costly. Deep learning-based approaches for non-invasive AD detection have been explored in recent studies, but the lack of accessible data hinders further improvements in detection performance. To address these challenges, we propose a novel semantic perturbation-based data augmentation method that essentially differs from existing techniques, which primarily rely on explicit data engineering. Our approach generates controlled semantic perturbations to enhance textual representations, aiding the model in identifying AD-specific linguistic patterns, particularly in scenarios with limited data availability. It learns contextual information and dynamically adjusts the perturbation degree for different linguistic features. This enhances the model’s sensitivity to AD-specific linguistic features and its robustness against natural language noise. Experimental results on the ADReSS challenge dataset demonstrate that our approach outperforms other strong and competitive deep learning methods.

2023

pdf bib
Towards Better Representations for Multi-Label Text Classification with Multi-granularity Information
Fangfang Li | Puzhen Su | Junwen Duan | Weidong Xiao
Findings of the Association for Computational Linguistics: EMNLP 2023

Multi-label text classification (MLTC) aims to assign multiple labels to a given text. Previous works have focused on text representation learning and label correlations modeling using pre-trained language models (PLMs). However, studies have shown that PLMs generate word frequency-oriented text representations, causing texts with different labels to be closely distributed in a narrow region, which is difficult to classify. To address this, we present a novel framework CL( ̲Contrastive  ̲Learning)-MIL ( ̲Multi-granularity  ̲Information  ̲Learning) to refine the text representation for MLTC task. We first use contrastive learning to generate uniform initial text representation and incorporate label frequency implicitly. Then, we design a multi-task learning module to integrate multi-granularity (diverse text-labels correlations, label-label relations and label frequency) information into text representations, enhancing their discriminative ability. Experimental results demonstrate the complementarity of the modules in CL-MIL, improving the quality of text representations and yielding stable and competitive improvements for MLTC.