Predicting Through Generation: Why Generation Is Better for Prediction

Md Kowsher; Nusrat Jahan Prottasha; Prakash Bhat; Chun-Nam Yu; Mojtaba Soltanalian; Ivan Garibay; Ozlem Garibay; Chen Chen; Niloofar Yousefi

doi:10.18653/v1/2025.acl-long.1303

Predicting Through Generation: Why Generation Is Better for Prediction

Md Kowsher, Nusrat Jahan Prottasha, Prakash Bhat, Chun-Nam Yu, Mojtaba Soltanalian, Ivan Garibay, Ozlem Garibay, Chen Chen, Niloofar Yousefi

Abstract

This paper argues that generating output tokens is more effective than using pooled representations for prediction tasks because token-level generation retains more mutual information. Since LLMs are trained on massive text corpora using next-token prediction, generation aligns naturally with their learned behavior. Using the Data Processing Inequality (DPI), we provide both theoretical and empirical evidence supporting this claim. However, autoregressive models face two key challenges when used for prediction: (1) exposure bias, where the model sees ground-truth tokens during training but relies on its own predictions during inference, leading to errors, and (2) format mismatch, where discrete tokens do not always align with the task’s required output structure. To address these challenges, we introduce PredGen (Predicting Through Generating), an end-to-end framework that (i) uses scheduled sampling to reduce exposure bias, and (ii) introduces a task adapter to convert the generated tokens into structured outputs. Additionally, we introduce Writer-Director Alignment Loss (WDAL), which ensures consistency between token generation and final task predictions, improving both text coherence and numerical accuracy. We evaluate PredGen on multiple classification and regression benchmarks. Our results show that PredGen consistently outperforms standard baselines, demonstrating its effectiveness in structured prediction tasks.

Anthology ID:: 2025.acl-long.1303
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 26845–26871
Language:
URL:: https://aclanthology.org/2025.acl-long.1303/
DOI:: 10.18653/v1/2025.acl-long.1303
Bibkey:
Cite (ACL):: Md Kowsher, Nusrat Jahan Prottasha, Prakash Bhat, Chun-Nam Yu, Mojtaba Soltanalian, Ivan Garibay, Ozlem Garibay, Chen Chen, and Niloofar Yousefi. 2025. Predicting Through Generation: Why Generation Is Better for Prediction. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 26845–26871, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Predicting Through Generation: Why Generation Is Better for Prediction (Kowsher et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.1303.pdf

PDF Cite Search Fix data