Sk Shahid
2025
LLM Compression: How Far Can We Go in Balancing Size and Performance?
Sahil Sk
|
Debashish Dhal
|
Sonal Khosla
|
Akash Dhaka
|
Shantipriya Parida
|
Sk Shahid
|
Sambit Shekhar
|
Dilip Prasad
|
Ondrej Bojar
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Quantization is an essential and popular technique for improving the accessibility of large language models (LLMs) by reducing memory usage and computational costs while maintaining performance. In this study, we apply 4-bit Group Scaling Quantization (GSQ) and Generative Pretrained Transformer Quantization (GPTQ) to LLaMA 1B, Qwen 0.5B, and PHI 1.5B, evaluating their impact across multiple NLP tasks. We benchmark these models on MS MARCO (Information Retrieval), BoolQ (Boolean Question Answering), and GSM8K (Mathematical Reasoning) datasets, assessing both accuracy and efficiency accross various tasks. The study measures the trade-offs between model compression and task performance, analyzing key evaluation metrics namely: accuracy, inference latency, and throughput, providing insights into the suitability of low-bit quantization for real-world deployment and highlight the tradeoffs between memory, computing and latency in such settings, helping a user make suitable decisions
2023
OdiaGenAI’s Participation at WAT2023
Sk Shahid
|
Guneet Singh Kohli
|
Sambit Sekhar
|
Debasish Dhal
|
Adit Sharma
|
Shubhendra Kushwaha
|
Shantipriya Parida
|
Stig-Arne Grönroos
|
Satya Ranjan Dash
Proceedings of the 10th Workshop on Asian Translation
This paper offers an in-depth overview of the team “ODIAGEN’s” translation system submitted to the Workshop on Asian Translation (WAT2023). Our focus lies in the domain of Indic Multimodal tasks, specifically targeting English to Hindi, English to Malayalam, and English to Bengali translations. The system uses a state-of-the-art Transformer-based architecture, specifically the NLLB-200 model, fine-tuned with language-specific Visual Genome Datasets. With this robust system, we were able to manage both text-to-text and multimodal translations, demonstrating versatility in handling different translation modes. Our results showcase strong performance across the board, with particularly promising results in the Hindi and Bengali translation tasks. A noteworthy achievement of our system lies in its stellar performance across all text-to-text translation tasks. In the categories of English to Hindi, English to Bengali, and English to Malayalam translations, our system claimed the top positions for both the evaluation and challenge sets. This system not only advances our understanding of the challenges and nuances of Indic language translation but also opens avenues for future research to enhance translation accuracy and performance.