Chu-Cheng Lin


2024

pdf bib
Low-Rank Adaptation for Multilingual Summarization: An Empirical Study
Chenxi Whitehouse | Fantine Huot | Jasmijn Bastings | Mostafa Dehghani | Chu-Cheng Lin | Mirella Lapata
Findings of the Association for Computational Linguistics: NAACL 2024

Although the advancements of pre-trained Large Language Models have significantly accelerated recent progress in NLP, their ever-increasing size poses significant challenges for conventional fine-tuning, especially in memory-intensive tasks. We investigate the potential of Parameter-Efficient Fine-Tuning, focusing on Low-Rank Adaptation (LoRA), in the domain of multilingual summarization, a task that is both challenging (due to typically long inputs), and relatively unexplored. We conduct an extensive study across different data availability scenarios, including high- and low-data settings, and cross-lingual transfer, leveraging models of different sizes. Our findings reveal that LoRA is competitive with full fine-tuning when trained with high quantities of data, and excels in low-data scenarios and cross-lingual transfer. We also study different strategies for few-shot cross-lingual transfer, finding that continued LoRA tuning outperforms full fine-tuning and the dynamic composition of language-specific LoRA modules.

2021

pdf bib
Limitations of Autoregressive Models and Their Alternatives
Chu-Cheng Lin | Aaron Jaech | Xin Li | Matthew R. Gormley | Jason Eisner
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Standard autoregressive language models perform only polynomial-time computation to compute the probability of the next symbol. While this is attractive, it means they cannot model distributions whose next-symbol probability is hard to compute. Indeed, they cannot even model them well enough to solve associated easy decision problems for which an engineer might want to consult a language model. These limitations apply no matter how much computation and data are used to train the model, unless the model is given access to oracle parameters that grow superpolynomially in sequence length. Thus, simply training larger autoregressive language models is not a panacea for NLP. Alternatives include energy-based models (which give up efficient sampling) and latent-variable autoregressive models (which give up efficient scoring of a given string). Both are powerful enough to escape the above limitations.

2019

pdf bib
Neural Finite-State Transducers: Beyond Rational Relations
Chu-Cheng Lin | Hao Zhu | Matthew R. Gormley | Jason Eisner
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We introduce neural finite state transducers (NFSTs), a family of string transduction models defining joint and conditional probability distributions over pairs of strings. The probability of a string pair is obtained by marginalizing over all its accepting paths in a finite state transducer. In contrast to ordinary weighted FSTs, however, each path is scored using an arbitrary function such as a recurrent neural network, which breaks the usual conditional independence assumption (Markov property). NFSTs are more powerful than previous finite-state models with neural features (Rastogi et al., 2016.) We present training and inference algorithms for locally and globally normalized variants of NFSTs. In experiments on different transduction tasks, they compete favorably against seq2seq models while offering interpretable paths that correspond to hard monotonic alignments.

2018

pdf bib
Neural Particle Smoothing for Sampling from Conditional Sequence Models
Chu-Cheng Lin | Jason Eisner
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We introduce neural particle smoothing, a sequential Monte Carlo method for sampling annotations of an input string from a given probability model. In contrast to conventional particle filtering algorithms, we train a proposal distribution that looks ahead to the end of the input string by means of a right-to-left LSTM. We demonstrate that this innovation can improve the quality of the sample. To motivate our formal choices, we explain how neural transduction models and our sampler can be viewed as low-dimensional but nonlinear approximations to working with HMMs over very large state spaces.

2015

pdf bib
Unsupervised POS Induction with Word Embeddings
Chu-Cheng Lin | Waleed Ammar | Chris Dyer | Lori Levin
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Not All Contexts Are Created Equal: Better Word Representations with Variable Attention
Wang Ling | Yulia Tsvetkov | Silvio Amir | Ramón Fermandez | Chris Dyer | Alan W Black | Isabel Trancoso | Chu-Cheng Lin
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
The CMU Submission for the Shared Task on Language Identification in Code-Switched Data
Chu-Cheng Lin | Waleed Ammar | Lori Levin | Chris Dyer
Proceedings of the First Workshop on Computational Approaches to Code Switching

pdf bib
Automatic Classification of Communicative Functions of Definiteness
Archna Bhatia | Chu-Cheng Lin | Nathan Schneider | Yulia Tsvetkov | Fatima Talib Al-Raisi | Laleh Roostapour | Jordan Bender | Abhimanu Kumar | Lori Levin | Mandy Simons | Chris Dyer
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2009

pdf bib
Modeling the Relationship among Linguistic Typological Features with Hierarchical Dirichlet Process
Chu-Cheng Lin | Yu-Chun Wang | Richard Tzong-Han Tsai
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

2007

pdf bib
Korean-Chinese Person Name Translation for Cross Language Information Retrieval
Yu-Chun Wang | Yi-Hsun Lee | Chu-Cheng Lin | Tzong-Han Richard Tsai | Wen-Lian Hsu
Proceedings of the 21st Pacific Asia Conference on Language, Information and Computation