Sai Koneru
Other people with similar names: Sai Koneru
Unverified author pages with similar names: Sai Koneru
2026
Diet-KIT: Post-Training Quantization for Speech LLMs
Danni Liu | Sai Koneru | Jan Niehues
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)
Danni Liu | Sai Koneru | Jan Niehues
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)
We present Diet-KIT, a system for the IWSLT speech translation compression task under a strict 4 GB on-disk storage constraint, starting from the 16 GB Qwen2-Audio-7B base model. Compression is achieved with a sequential pipeline based on Half-Quadratic Quantization (HQQ). Based on systematic ablations, we find that 4-bit quantization preserves translation quality well, whereas 3-bit quantization induces a sharp performance cliff, precluding aggressive compression across the whole model. We further show that the embedding table tolerates 2-bit quantization with negligible loss, while the LM head requires higher precision. To satisfy the storage constraint, we propose a sensitivity-guided layer selection method that identifies MLP sublayers tolerant to 3-bit compression via a per-layer sensitivity analysis, which consistently outperforms manual and random layer selection. Finally, AWQ calibration is applied as a data-driven refinement stage. The final system achieves 3.98 GB on disk with COMET scores of 74.4 on en→de and 77.1 on en→zh, compared to 75.6 and 79.5 for the uncompressed fine-tuned model.