Seungwon Kim
2021
Revisiting Pretraining with Adapters
Seungwon Kim
|
Alex Shum
|
Nathan Susanj
|
Jonathan Hilgart
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)
Pretrained language models have served as the backbone for many state-of-the-art NLP results. These models are large and expensive to train. Recent work suggests that continued pretraining on task-specific data is worth the effort as pretraining leads to improved performance on downstream tasks. We explore alternatives to full-scale task-specific pretraining of language models through the use of adapter modules, a parameter-efficient approach to transfer learning. We find that adapter-based pretraining is able to achieve comparable results to task-specific pretraining while using a fraction of the overall trainable parameters. We further explore direct use of adapters without pretraining and find that the direct fine-tuning performs mostly on par with pretrained adapter models, contradicting previously proposed benefits of continual pretraining in full pretraining fine-tuning strategies. Lastly, we perform an ablation study on task-adaptive pretraining to investigate how different hyperparameter settings can change the effectiveness of the pretraining.
2020
Using Pre-Trained Transformer for Better Lay Summarization
Seungwon Kim
Proceedings of the First Workshop on Scholarly Document Processing
In this paper, we tack lay summarization tasks, which aim to automatically produce lay summaries for scientific papers, to participate in the first CL-LaySumm 2020 in SDP workshop at EMNLP 2020. We present our approach of using Pre-training with Extracted Gap-sentences for Abstractive Summarization (PEGASUS; Zhang et al., 2019b) to produce the lay summary and combining those with the extractive summarization model using Bidirectional Encoder Representations from Transformers (BERT; Devlin et al., 2018) and readability metrics that measure the readability of the sentence to further improve the quality of the summary. Our model achieves a remarkable performance on ROUGE metrics, demonstrating the produced summary is more readable while it summarizes the main points of the document.