Shuang Zhou


2024

pdf bib
Enhancing Explainable Rating Prediction through Annotated Macro Concepts
Huachi Zhou | Shuang Zhou | Hao Chen | Ninghao Liu | Fan Yang | Xiao Huang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Generating recommendation reasons for recommendation results is a long-standing problem because it is challenging to explain the underlying reasons for recommending an item based on user and item IDs. Existing models usually learn semantic embeddings for each user and item, and generate the reasons according to the embeddings of the user-item pair. However, user and item IDs do not carry inherent semantic meaning, thus the limited number of reviews cannot model users’ preferences and item characteristics effectively, negatively affecting the model generalization for unseen user-item pairs.To tackle the problem, we propose the Concept Enhanced Explainable Recommendation framework (CEER), which utilizes macro concepts as the intermediary to bridge the gap between the user/item embeddings and the recommendation reasons. Specifically, we maximize the information bottleneck to extract macro concepts from user-item reviews. Then, for recommended user-item pairs, we jointly train the concept embeddings with the user and item embeddings, and generate the explanation according to the concepts. Extensive experiments on three datasets verify the superiority of our CEER model.

2020

pdf bib
PHICON: Improving Generalization of Clinical Text De-identification Models via Data Augmentation
Xiang Yue | Shuang Zhou
Proceedings of the 3rd Clinical Natural Language Processing Workshop

De-identification is the task of identifying protected health information (PHI) in the clinical text. Existing neural de-identification models often fail to generalize to a new dataset. We propose a simple yet effective data augmentation method PHICON to alleviate the generalization issue. PHICON consists of PHI augmentation and Context augmentation, which creates augmented training corpora by replacing PHI entities with named-entities sampled from external sources, and by changing background context with synonym replacement or random word insertion, respectively. Experimental results on the i2b2 2006 and 2014 de-identification challenge datasets show that PHICON can help three selected de-identification models boost F1-score (by at most 8.6%) on cross-dataset test setting. We also discuss how much augmentation to use and how each augmentation method influences the performance.