@inproceedings{wang-etal-2025-rethinking-large,
title = "Rethinking Large Language Model Architectures for Sequential Recommendations",
author = "Wang, Hanbing and
Liu, Xiaorui and
Fan, Wenqi and
Zhao, Xiangyu and
Kini, Venkataramana and
Yadav, Devendra Pratap and
Wang, Fei and
Wen, Zhen and
Liu, Hui",
editor = "Inui, Kentaro and
Sakti, Sakriani and
Wang, Haofen and
Wong, Derek F. and
Bhattacharyya, Pushpak and
Banerjee, Biplab and
Ekbal, Asif and
Chakraborty, Tanmoy and
Singh, Dhirendra Pratap",
booktitle = "Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics",
month = dec,
year = "2025",
address = "Mumbai, India",
publisher = "The Asian Federation of Natural Language Processing and The Association for Computational Linguistics",
url = "https://aclanthology.org/2025.ijcnlp-long.180/",
pages = "3376--3391",
ISBN = "979-8-89176-298-5",
abstract = "In recent times, there has been a shift towards adapting sequential recommendation to LLM paradigm to harness the capabilities of LLMs. These methods typically formulate recommendation data into natural language and train the model to forecast the subsequent item in an auto-regressive manner. Despite their notable success, the significant computational burden during inference poses a major challenge to their practical implementation. In this study, we aim to streamline current LLM-based recommendation models and introduce a straightforward yet highly effective model Lite-LLM4Rec. The primary objective of Lite-LLM4Rec is to ensure efficient inference for the sequential recommendation task. Lite-LLM4Rec circumvents the step-by-step beam search decoding by employing a direct item projection head to produce ranking scores in one step. This design arises from our empirical finding that beam search decoding is ultimately unnecessary for sequential recommendations. Additionally, Lite-LLM4Rec introduces a hierarchical LLM structure crafted to efficiently handle the extensive contextual information of items and redundant computation issue, thus diminishing computational overhead while enjoying the power of LLMs. Experiments on four publicly available datasets validate the efficacy of Lite-LLM4Rec in enhancing both performance and inference efficiency (notably 46.8{\%} performance improvement and 99.48{\%} efficiency improvement on ML-1m) compared to existing LLM-based methods. Our implementations are available at: \url{https://github.com/HanbingWang2001/Lite-LLM4Rec-PyTorch}."
}<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="wang-etal-2025-rethinking-large">
<titleInfo>
<title>Rethinking Large Language Model Architectures for Sequential Recommendations</title>
</titleInfo>
<name type="personal">
<namePart type="given">Hanbing</namePart>
<namePart type="family">Wang</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Xiaorui</namePart>
<namePart type="family">Liu</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Wenqi</namePart>
<namePart type="family">Fan</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Xiangyu</namePart>
<namePart type="family">Zhao</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Venkataramana</namePart>
<namePart type="family">Kini</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Devendra</namePart>
<namePart type="given">Pratap</namePart>
<namePart type="family">Yadav</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Fei</namePart>
<namePart type="family">Wang</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Zhen</namePart>
<namePart type="family">Wen</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Hui</namePart>
<namePart type="family">Liu</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<originInfo>
<dateIssued>2025-12</dateIssued>
</originInfo>
<typeOfResource>text</typeOfResource>
<relatedItem type="host">
<titleInfo>
<title>Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics</title>
</titleInfo>
<name type="personal">
<namePart type="given">Kentaro</namePart>
<namePart type="family">Inui</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sakriani</namePart>
<namePart type="family">Sakti</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Haofen</namePart>
<namePart type="family">Wang</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Derek</namePart>
<namePart type="given">F</namePart>
<namePart type="family">Wong</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Pushpak</namePart>
<namePart type="family">Bhattacharyya</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Biplab</namePart>
<namePart type="family">Banerjee</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Asif</namePart>
<namePart type="family">Ekbal</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Tanmoy</namePart>
<namePart type="family">Chakraborty</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Dhirendra</namePart>
<namePart type="given">Pratap</namePart>
<namePart type="family">Singh</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<publisher>The Asian Federation of Natural Language Processing and The Association for Computational Linguistics</publisher>
<place>
<placeTerm type="text">Mumbai, India</placeTerm>
</place>
</originInfo>
<genre authority="marcgt">conference publication</genre>
<identifier type="isbn">979-8-89176-298-5</identifier>
</relatedItem>
<abstract>In recent times, there has been a shift towards adapting sequential recommendation to LLM paradigm to harness the capabilities of LLMs. These methods typically formulate recommendation data into natural language and train the model to forecast the subsequent item in an auto-regressive manner. Despite their notable success, the significant computational burden during inference poses a major challenge to their practical implementation. In this study, we aim to streamline current LLM-based recommendation models and introduce a straightforward yet highly effective model Lite-LLM4Rec. The primary objective of Lite-LLM4Rec is to ensure efficient inference for the sequential recommendation task. Lite-LLM4Rec circumvents the step-by-step beam search decoding by employing a direct item projection head to produce ranking scores in one step. This design arises from our empirical finding that beam search decoding is ultimately unnecessary for sequential recommendations. Additionally, Lite-LLM4Rec introduces a hierarchical LLM structure crafted to efficiently handle the extensive contextual information of items and redundant computation issue, thus diminishing computational overhead while enjoying the power of LLMs. Experiments on four publicly available datasets validate the efficacy of Lite-LLM4Rec in enhancing both performance and inference efficiency (notably 46.8% performance improvement and 99.48% efficiency improvement on ML-1m) compared to existing LLM-based methods. Our implementations are available at: https://github.com/HanbingWang2001/Lite-LLM4Rec-PyTorch.</abstract>
<identifier type="citekey">wang-etal-2025-rethinking-large</identifier>
<location>
<url>https://aclanthology.org/2025.ijcnlp-long.180/</url>
</location>
<part>
<date>2025-12</date>
<extent unit="page">
<start>3376</start>
<end>3391</end>
</extent>
</part>
</mods>
</modsCollection>
%0 Conference Proceedings
%T Rethinking Large Language Model Architectures for Sequential Recommendations
%A Wang, Hanbing
%A Liu, Xiaorui
%A Fan, Wenqi
%A Zhao, Xiangyu
%A Kini, Venkataramana
%A Yadav, Devendra Pratap
%A Wang, Fei
%A Wen, Zhen
%A Liu, Hui
%Y Inui, Kentaro
%Y Sakti, Sakriani
%Y Wang, Haofen
%Y Wong, Derek F.
%Y Bhattacharyya, Pushpak
%Y Banerjee, Biplab
%Y Ekbal, Asif
%Y Chakraborty, Tanmoy
%Y Singh, Dhirendra Pratap
%S Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
%D 2025
%8 December
%I The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
%C Mumbai, India
%@ 979-8-89176-298-5
%F wang-etal-2025-rethinking-large
%X In recent times, there has been a shift towards adapting sequential recommendation to LLM paradigm to harness the capabilities of LLMs. These methods typically formulate recommendation data into natural language and train the model to forecast the subsequent item in an auto-regressive manner. Despite their notable success, the significant computational burden during inference poses a major challenge to their practical implementation. In this study, we aim to streamline current LLM-based recommendation models and introduce a straightforward yet highly effective model Lite-LLM4Rec. The primary objective of Lite-LLM4Rec is to ensure efficient inference for the sequential recommendation task. Lite-LLM4Rec circumvents the step-by-step beam search decoding by employing a direct item projection head to produce ranking scores in one step. This design arises from our empirical finding that beam search decoding is ultimately unnecessary for sequential recommendations. Additionally, Lite-LLM4Rec introduces a hierarchical LLM structure crafted to efficiently handle the extensive contextual information of items and redundant computation issue, thus diminishing computational overhead while enjoying the power of LLMs. Experiments on four publicly available datasets validate the efficacy of Lite-LLM4Rec in enhancing both performance and inference efficiency (notably 46.8% performance improvement and 99.48% efficiency improvement on ML-1m) compared to existing LLM-based methods. Our implementations are available at: https://github.com/HanbingWang2001/Lite-LLM4Rec-PyTorch.
%U https://aclanthology.org/2025.ijcnlp-long.180/
%P 3376-3391
Markdown (Informal)
[Rethinking Large Language Model Architectures for Sequential Recommendations](https://aclanthology.org/2025.ijcnlp-long.180/) (Wang et al., IJCNLP-AACL 2025)
ACL
- Hanbing Wang, Xiaorui Liu, Wenqi Fan, Xiangyu Zhao, Venkataramana Kini, Devendra Pratap Yadav, Fei Wang, Zhen Wen, and Hui Liu. 2025. Rethinking Large Language Model Architectures for Sequential Recommendations. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 3376–3391, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.