Should I try multiple optimizers when fine-tuning a pre-trained Transformer for NLP tasks? Should I tune their hyperparameters? Nefeli Gkouti author Prodromos Malakasiotis author Stavros Toumpis author Ion Androutsopoulos author 2024-03 text Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) Yvette Graham editor Matthew Purver editor Association for Computational Linguistics St. Julian’s, Malta conference publication gkouti-etal-2024-try 10.18653/v1/2024.eacl-long.157 https://aclanthology.org/2024.eacl-long.157/ 2024-03 2555 2574