Shou-De Lin

Also published as: Shou-de Lin


2024

pdf bib
Transferable Embedding Inversion Attack: Uncovering Privacy Risks in Text Embeddings without Model Queries
Yu-Hsiang Huang | Yuche Tsai | Hsiang Hsiao | Hong-Yi Lin | Shou-De Lin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This study investigates the privacy risks associated with text embeddings, focusing on the scenario where attackers cannot access the original embedding model. Contrary to previous research requiring direct model access, we explore a more realistic threat model by developing a transfer attack method. This approach uses a surrogate model to mimic the victim model’s behavior, allowing the attacker to infer sensitive information from text embeddings without direct access. Our experiments across various embedding models and a clinical dataset demonstrate that our transfer attack significantly outperforms traditional methods, revealing the potential privacy vulnerabilities in embedding technologies and emphasizing the need for enhanced security measures.

2020

pdf bib
Explaining Word Embeddings via Disentangled Representation
Keng-Te Liao | Cheng-Syuan Lee | Zhong-Yu Huang | Shou-de Lin
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Disentangled representations have attracted increasing attention recently. However, how to transfer the desired properties of disentanglement to word representations is unclear. In this work, we propose to transform typical dense word vectors into disentangled embeddings featuring improved interpretability via encoding polysemous semantics separately. We also found the modular structure of our disentangled word embeddings helps generate more efficient and effective features for natural language processing tasks.

pdf bib
Explainable and Sparse Representations of Academic Articles for Knowledge Exploration
Keng-Te Liao | Zhihong Shen | Chiyuan Huang | Chieh-Han Wu | PoChun Chen | Kuansan Wang | Shou-de Lin
Proceedings of the 28th International Conference on Computational Linguistics

We focus on a recently deployed system built for summarizing academic articles by concept tagging. The system has shown great coverage and high accuracy of concept identification which could be contributed by the knowledge acquired from millions of publications. Provided with the interpretable concepts and knowledge encoded in a pre-trained neural model, we investigate whether the tagged concepts can be applied to a broader class of applications. We propose transforming the tagged concepts into sparse vectors as representations of academic documents. The effectiveness of the representations is analyzed theoretically by a proposed framework. We also empirically show that the representations can have advantages on academic topic discovery and paper recommendation. On these applications, we reveal that the knowledge encoded in the tagging system can be effectively utilized and can help infer additional features from data with limited information.

pdf bib
Glyph2Vec: Learning Chinese Out-of-Vocabulary Word Embedding from Glyphs
Hong-You Chen | Sz-Han Yu | Shou-de Lin
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Chinese NLP applications that rely on large text often contain huge amounts of vocabulary which are sparse in corpus. We show that characters’ written form, Glyphs, in ideographic languages could carry rich semantics. We present a multi-modal model, Glyph2Vec, to tackle Chinese out-of-vocabulary word embedding problem. Glyph2Vec extracts visual features from word glyphs to expand current word embedding space for out-of-vocabulary word embedding, without the need of accessing any corpus, which is useful for improving Chinese NLP systems, especially for low-resource scenarios. Experiments across different applications show the significant effectiveness of our model.

2019

pdf bib
Self-Discriminative Learning for Unsupervised Document Embedding
Hong-You Chen | Chin-Hua Hu | Leila Wehbe | Shou-De Lin
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Unsupervised document representation learning is an important task providing pre-trained features for NLP applications. Unlike most previous work which learn the embedding based on self-prediction of the surface of text, we explicitly exploit the inter-document information and directly model the relations of documents in embedding space with a discriminative network and a novel objective. Extensive experiments on both small and large public datasets show the competitiveness of the proposed method. In evaluations on standard document classification, our model has errors that are 5 to 13% lower than state-of-the-art unsupervised embedding models. The reduction in error is even more pronounced in scarce label setting.

pdf bib
Multiple Text Style Transfer by using Word-level Conditional Generative Adversarial Network with Two-Phase Training
Chih-Te Lai | Yi-Te Hong | Hong-You Chen | Chi-Jen Lu | Shou-De Lin
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The objective of non-parallel text style transfer, or controllable text generation, is to alter specific attributes (e.g. sentiment, mood, tense, politeness, etc) of a given text while preserving its remaining attributes and content. Generative adversarial network (GAN) is a popular model to ensure the transferred sentences are realistic and have the desired target styles. However, training GAN often suffers from mode collapse problem, which causes that the transferred text is little related to the original text. In this paper, we propose a new GAN model with a word-level conditional architecture and a two-phase training procedure. By using a style-related condition architecture before generating a word, our model is able to maintain style-unrelated words while changing the others. By separating the training procedure into reconstruction and transfer phases, our model is able to learn a proper text generation process, which further improves the content preservation. We test our model on polarity sentiment transfer and multiple-attribute transfer tasks. The empirical results show that our model achieves comparable evaluation scores in both transfer accuracy and fluency but significantly outperforms other state-of-the-art models in content compatibility on three real-world datasets.

pdf bib
Controlling Sequence-to-Sequence Models - A Demonstration on Neural-based Acrostic Generator
Liang-Hsin Shen | Pei-Lun Tai | Chao-Chung Wu | Shou-De Lin
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

An acrostic is a form of writing that the first token of each line (or other recurring features in the text) forms a meaningful sequence. In this paper we present a generalized acrostic generation system that can hide certain message in a flexible pattern specified by the users. Different from previous works that focus on rule-based solutions, here we adopt a neural- based sequence-to-sequence model to achieve this goal. Besides acrostic, users are also allowed to specify the rhyme and length of the output sequences. Based on our knowledge, this is the first neural-based natural language generation system that demonstrates the capability of performing micro-level control over output sentences.

2018

pdf bib
Word Relation Autoencoder for Unseen Hypernym Extraction Using Word Embeddings
Hong-You Chen | Cheng-Syuan Lee | Keng-Te Liao | Shou-De Lin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Lexicon relation extraction given distributional representation of words is an important topic in NLP. We observe that the state-of-the-art projection-based methods cannot be generalized to handle unseen hypernyms. We propose to analyze it in the perspective of pollution and construct the corresponding indicator to measure it. We propose a word relation autoencoder (WRAE) model to address the challenge. Experiments on several hypernym-like lexicon datasets show that our model outperforms the competitors significantly.

2016

pdf bib
Enriching Cold Start Personalized Language Model Using Social Network Information
Yu-Yang Huang | Rui Yan | Tsung-Ting Kuo | Shou-De Lin
International Journal of Computational Linguistics & Chinese Language Processing, Volume 21, Number 1, June 2016

pdf bib
Transferring User Interests Across Websites with Unstructured Text for Cold-Start Recommendation
Yu-Yang Huang | Shou-De Lin
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

pdf bib
Proceedings of the third International Workshop on Natural Language Processing for Social Media
Shou-de Lin | Lun-Wei Ku | Cheng-Te Li | Erik Cambria
Proceedings of the third International Workshop on Natural Language Processing for Social Media

2014

pdf bib
Enriching Cold Start Personalized Language Model Using Social Network Information
Yu-Yang Huang | Rui Yan | Tsung-Ting Kuo | Shou-De Lin
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Proceedings of the Second Workshop on Natural Language Processing for Social Media (SocialNLP)
Shou-de Lin | Lun-Wei Ku | Erik Cambria | Tsung-Ting Kuo
Proceedings of the Second Workshop on Natural Language Processing for Social Media (SocialNLP)

2013

pdf bib
Proceedings of the IJCNLP 2013 Workshop on Natural Language Processing for Social Media (SocialNLP)
Shou-de Lin | Lun-Wei Ku | Tsung-Ting Kuo
Proceedings of the IJCNLP 2013 Workshop on Natural Language Processing for Social Media (SocialNLP)

pdf bib
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval
Rui Yan | Han Jiang | Mirella Lapata | Shou-De Lin | Xueqiang Lv | Xiaoming Li
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
利用機器學習於中文法律文件之標記、案件分類及量刑預測 (Exploiting Machine Learning Models for Chinese Legal Documents Labeling, Case Classification, and Sentencing Prediction) [In Chinese]
Wan-Chen Lin | Tsung-Ting Kuo | Tung-Jia Chang | Chueh-An Yen | Chao-Ju Chen | Shou-de Lin
Proceedings of the 24th Conference on Computational Linguistics and Speech Processing (ROCLING 2012)

pdf bib
利用機器學習於中文法律文件之標記、案件分類及量刑預測 (Exploiting Machine Learning Models for Chinese Legal Documents Labeling, Case Classification, and Sentencing Prediction) [In Chinese]
Wan-Chen Lin | Tsung-Ting Kuo | Tung-Jia Chang | Chueh-An Yen | Chao-Ju Chen | Shou-de Lin
International Journal of Computational Linguistics & Chinese Language Processing, Volume 17, Number 4, December 2012-Special Issue on Selected Papers from ROCLING XXIV

pdf bib
Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks
Tsung-Ting Kuo | San-Chuan Hung | Wei-Shih Lin | Nanyun Peng | Shou-De Lin | Wei-Fen Lin
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
Wan-Yu Lin | Nanyun Peng | Chun-Chao Yen | Shou-de Lin
Proceedings of the ACL 2012 System Demonstrations

2011

pdf bib
MemeTube: A Sentiment-based Audiovisual System for Analyzing and Displaying Microblog Messages
Cheng-Te Li | Chien-Yuan Wang | Chien-Lin Tseng | Shou-De Lin
Proceedings of the ACL-HLT 2011 System Demonstrations

pdf bib
IMASS: An Intelligent Microblog Analysis and Summarization System
Jui-Yu Weng | Cheng-Lun Yang | Bo-Nian Chen | Yen-Kai Wang | Shou-De Lin
Proceedings of the ACL-HLT 2011 System Demonstrations

2010

pdf bib
Identifying Correction Rules for Auto Editing
Anta Huang | Tsung-Ting Kuo | Ying-Chun Lai | Shou-de Lin
ROCLING 2010 Poster Papers

pdf bib
Discovering Correction Rules for Auto Editing
An-Ta Huang | Tsung-Ting Kuo | Ying-Chun Lai | Shou-De Lin
International Journal of Computational Linguistics & Chinese Language Processing, Volume 15, Number 3-4, September/December 2010