ETAS: Zero-Shot Transformer Architecture Search via Network Trainability and Expressivity

Jiechao Yang; Yong Liu

doi:10.18653/v1/2024.findings-acl.405

ETAS: Zero-Shot Transformer Architecture Search via Network Trainability and Expressivity

Abstract

Transformer Architecture Search (TAS) methods aim to automate searching for the optimal Transformer architecture configurations for a given task. However, they are impeded by the prohibitive cost of evaluating Transformer architectures. Recently, several Zero-Shot TAS methods have been proposed to mitigate this problem by utilizing zero-cost proxies to evaluate Transformer architectures without training. Unfortunately, they are limited to specific computer vision or natural language processing tasks. Nonetheless, most of them are developed based on empirical observations and lack theoretical guarantees. To solve this problem, we develop a new zero-cost proxy called NTSR that combines two theoretically-inspired indicators to measure the trainability and expressivity of Transformer networks separately. We then integrate it into an effective regularized evolution framework called ETAS to demonstrate its efficacy on various tasks. The results show that our proposed NTSR proxy can consistently achieve a higher correlation with the true performance of Transformer networks on both computer vision and natural language processing tasks. Further, it can significantly accelerate the search process for finding the best-performing Transformer architecture configurations.

Anthology ID:: 2024.findings-acl.405
Volume:: Findings of the Association for Computational Linguistics: ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6780–6795
Language:
URL:: https://aclanthology.org/2024.findings-acl.405
DOI:: 10.18653/v1/2024.findings-acl.405
Bibkey:
Cite (ACL):: Jiechao Yang and Yong Liu. 2024. ETAS: Zero-Shot Transformer Architecture Search via Network Trainability and Expressivity. In Findings of the Association for Computational Linguistics: ACL 2024, pages 6780–6795, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: ETAS: Zero-Shot Transformer Architecture Search via Network Trainability and Expressivity (Yang & Liu, Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.405.pdf

PDF Cite Search