Jason Kuen
2023
A Critical Analysis of Document Out-of-Distribution Detection
Jiuxiang Gu | Yifei Ming | Yi Zhou | Jason Kuen | Vlad Morariu | Handong Zhao | Ruiyi Zhang | Nikolaos Barmpalios | Anqi Liu | Yixuan Li | Tong Sun | Ani Nenkova
Findings of the Association for Computational Linguistics: EMNLP 2023
Jiuxiang Gu | Yifei Ming | Yi Zhou | Jason Kuen | Vlad Morariu | Handong Zhao | Ruiyi Zhang | Nikolaos Barmpalios | Anqi Liu | Yixuan Li | Tong Sun | Ani Nenkova
Findings of the Association for Computational Linguistics: EMNLP 2023
Large-scale pre-training is widely used in recent document understanding tasks. During deployment, one may expect that models should trigger a conservative fallback policy when encountering out-of-distribution (OOD) samples, which highlights the importance of OOD detection. However, most existing OOD detection methods focus on single-modal inputs such as images or texts. While documents are multi-modal in nature, it is underexplored if and how multi-modal information in documents can be exploited for OOD detection. In this work, we first provide a systematic and in-depth analysis on OOD detection for document understanding models. We study the effects of model modality, pre-training, and fine-tuning across various types of OOD inputs. In particular, we find that spatial information is critical for document OOD detection. To better exploit spatial information, we propose a spatial-aware adapter, which serves as a parameter-efficient add-on module to adapt transformer-based language models to the document domain. Extensive experiments show that adding the spatial-aware adapter significantly improves the OOD detection performance compared to directly using the language model and achieves superior performance compared to competitive baselines.
2022
Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns
Zihan Wang | Jiuxiang Gu | Jason Kuen | Handong Zhao | Vlad Morariu | Ruiyi Zhang | Ani Nenkova | Tong Sun | Jingbo Shang
Findings of the Association for Computational Linguistics: ACL 2022
Zihan Wang | Jiuxiang Gu | Jason Kuen | Handong Zhao | Vlad Morariu | Ruiyi Zhang | Ani Nenkova | Tong Sun | Jingbo Shang
Findings of the Association for Computational Linguistics: ACL 2022
We present a comprehensive study of sparse attention patterns in Transformer models. We first question the need for pre-training with sparse attention and present experiments showing that an efficient fine-tuning only approach yields a slightly worse but still competitive model. Then we compare the widely used local attention pattern and the less-well-studied global attention pattern, demonstrating that global patterns have several unique advantages. We also demonstrate that a flexible approach to attention, with different patterns across different layers of the model, is beneficial for some tasks. Drawing on this insight, we propose a novel Adaptive Axis Attention method, which learns—during fine-tuning—different attention patterns for each Transformer layer depending on the downstream task. Rather than choosing a fixed attention pattern, the adaptive axis attention method identifies important tokens—for each task and model layer—and focuses attention on those. It does not require pre-training to accommodate the sparse patterns and demonstrates competitive and sometimes better performance against fixed sparse attention patterns that require resource-intensive pre-training.