Skip to main content
EPI Lab
News

LLM Quantization Research Accepted by ACM ASPLOS 2025

The laboratory's collaborative paper with the Institute of Computing Technology, Chinese Academy of Sciences, "COMET: Towards Practical W4A4KV4 LLMs Serving", was accepted by ACM ASPLOS 2025, a top conference in computer architecture.

The paper proposes a fine-grained mixed-precision quantization method and realizes efficient W4A4KV4 inference for large language models. By leveraging modern GPU INT4 tensor cores, reducing the memory bottleneck of KV cache, and developing optimized kernels and data layouts, the COMET framework achieves significant inference speedups on LLaMA-series models.

The paper information is: Lian Liu, Long Cheng, Haimeng Ren, Zhaohui Xu, Yudong Pan, Mengdi Wang, Xiaowei Li, Yinhe Han, Ying Wang. "COMET: Towards Practical W4A4KV4 LLMs Serving." Proc. ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Rotterdam, The Netherlands, April 2025.