Efficient Structured Prediction with Transformer Encoders

Ali Basirat


Abstract
Finetuning is a useful method for adapting Transformer-based text encoders to new tasks but can be computationally expensive for structured prediction tasks that require tuning at the token level. Furthermore, finetuning is inherently inefficient in updating all base model parameters, which prevents parameter sharing across tasks. To address these issues, we propose a method for efficient task adaptation of frozen Transformer encoders based on the local contribution of their intermediate layers to token representations. Our adapter uses a novel attention mechanism to aggregate intermediate layers and tailor the resulting representations to a target task. Experiments on several structured prediction tasks demonstrate that our method outperforms previous approaches, retaining over 99% of the finetuning performance at a fraction of the training cost. Our proposed method offers an efficient solution for adapting frozen Transformer encoders to new tasks, improving performance and enabling parameter sharing across different tasks.
Anthology ID:
2024.nejlt-1.1
Volume:
Northern European Journal of Language Technology, Volume 10
Month:
December
Year:
2024
Address:
Linköping, Sweden
Editor:
Marcel Bollmann
Venue:
NEJLT
SIG:
Publisher:
Linköping University Electronic Press
Note:
Pages:
1–13
Language:
URL:
https://aclanthology.org/2024.nejlt-1.1/
DOI:
10.3384/nejlt.2000-1533.2024.4932
Bibkey:
Cite (ACL):
Ali Basirat. 2024. Efficient Structured Prediction with Transformer Encoders. Northern European Journal of Language Technology, 10:1–13.
Cite (Informal):
Efficient Structured Prediction with Transformer Encoders (Basirat, NEJLT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nejlt-1.1.pdf