Cen Chen


pdf bib
A Customized Text Sanitization Mechanism with Differential Privacy
Sai Chen | Fengran Mo | Yanhao Wang | Cen Chen | Jian-Yun Nie | Chengyu Wang | Jamie Cui
Findings of the Association for Computational Linguistics: ACL 2023

As privacy issues are receiving increasing attention within the Natural Language Processing (NLP) community, numerous methods have been proposed to sanitize texts subject to differential privacy. However, the state-of-the-art text sanitization mechanisms based on a relaxed notion of metric local differential privacy (MLDP) do not apply to non-metric semantic similarity measures and cannot achieve good privacy-utility trade-offs. To address these limitations, we propose a novel Customized Text sanitization (CusText) mechanism based on the original 𝜖-differential privacy (DP) definition, which is compatible with any similarity measure.Moreover, CusText assigns each input token a customized output set to provide more advanced privacy protection at the token level.Extensive experiments on several benchmark datasets show that CusText achieves a better trade-off between privacy and utility than existing mechanisms.The code is available at https://github.com/sai4july/CusText.

pdf bib
XtremeCLIP: Extremely Parameter-efficient Tuning for Low-resource Vision Language Understanding
Moming Tang | Chengyu Wang | Jianing Wang | Chuanqi Tan | Songfang Huang | Cen Chen | Weining Qian
Findings of the Association for Computational Linguistics: ACL 2023

Recently, Contrastive Visual-Language Pre-training (CLIP) has demonstrated remarkable capability in various Visual Language Understanding (VLU) tasks. Yet, most CLIP-based methods require tasks-specific designs and sufficient training data. In this paper, we introduce a simple yet efficient paradigm for low-resource VLU named XtremeCLIP, which involves very few trainable parameters to improve the generalization ability of the trained models. In our XtremeCLIP framework, we reformulate a series of VLU tasks as a unified open-book affinity-matching problem. Furthermore, to handle the insufficient supervised signals in small datasets, we adopt contrastive learning to utilize the implicit sorting information of ground-truth labels to provide more supervised cues. Extensive experiments over multiple datasets on visual entailment, visual question answering, and image classification show that XtremeCLIP consistently outperforms existing baselines in low-resource settings.

pdf bib
Rapid Diffusion: Building Domain-Specific Text-to-Image Synthesizers with Fast Inference Speed
Bingyan Liu | Weifeng Lin | Zhongjie Duan | Chengyu Wang | Wu Ziheng | Zhang Zipeng | Kui Jia | Lianwen Jin | Cen Chen | Jun Huang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)

Text-to-Image Synthesis (TIS) aims to generate images based on textual inputs. Recently, several large pre-trained diffusion models have been released to create high-quality images with pre-trained text encoders and diffusion-based image synthesizers. However, popular diffusion-based models from the open-source community cannot support industrial domain-specific applications due to the lack of entity knowledge and low inference speed. In this paper, we propose Rapid Diffusion, a novel framework for training and deploying super-resolution, text-to-image latent diffusion models with rich entity knowledge injected and optimized networks. Furthermore, we employ BladeDISC, an end-to-end Artificial Intelligence (AI) compiler, and FlashAttention techniques to optimize computational graphs of the generated models for online deployment. Experiments verify the effectiveness of our approach in terms of image quality and inference speed. In addition, we present industrial use cases and integrate Rapid Diffusion to an AI platform to show its practical values.


pdf bib
SueNes: A Weakly Supervised Approach to Evaluating Single-Document Summarization via Negative Sampling
Forrest Bao | Ge Luo | Hebi Li | Minghui Qiu | Yinfei Yang | Youbiao He | Cen Chen
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Canonical automatic summary evaluation metrics, such as ROUGE, focus on lexical similarity which cannot well capture semantics nor linguistic quality and require a reference summary which is costly to obtain. Recently, there have been a growing number of efforts to alleviate either or both of the two drawbacks. In this paper, we present a proof-of-concept study to a weakly supervised summary evaluation approach without the presence of reference summaries. Massive data in existing summarization datasets are transformed for training by pairing documents with corrupted reference summaries. In cross-domain tests, our strategy outperforms baselines with promising improvements, and show a great advantage in gauging linguistic qualities over all metrics.


pdf bib
Cross-Domain Review Helpfulness Prediction Based on Convolutional Neural Networks with Auxiliary Domain Discriminators
Cen Chen | Yinfei Yang | Jun Zhou | Xiaolong Li | Forrest Sheng Bao
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

With the growing amount of reviews in e-commerce websites, it is critical to assess the helpfulness of reviews and recommend them accordingly to consumers. Recent studies on review helpfulness require plenty of labeled samples for each domain/category of interests. However, such an approach based on close-world assumption is not always practical, especially for domains with limited reviews or the “out-of-vocabulary” problem. Therefore, we propose a convolutional neural network (CNN) based model which leverages both word-level and character-based representations. To transfer knowledge between domains, we further extend our model to jointly model different domains with auxiliary domain discriminators. On the Amazon product review dataset, our approach significantly outperforms the state of the art in terms of both accuracy and cross-domain robustness.


pdf bib
Aspect Extraction from Product Reviews Using Category Hierarchy Information
Yinfei Yang | Cen Chen | Minghui Qiu | Forrest Bao
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Aspect extraction abstracts the common properties of objects from corpora discussing them, such as reviews of products. Recent work on aspect extraction is leveraging the hierarchical relationship between products and their categories. However, such effort focuses on the aspects of child categories but ignores those from parent categories. Hence, we propose an LDA-based generative topic model inducing the two-layer categorical information (CAT-LDA), to balance the aspects of both a parent category and its child categories. Our hypothesis is that child categories inherit aspects from parent categories, controlled by the hierarchy between them. Experimental results on 5 categories of Amazon.com products show that both common aspects of parent category and the individual aspects of sub-categories can be extracted to align well with the common sense. We further evaluate the manually extracted aspects of 16 products, resulting in an average hit rate of 79.10%.