[ USENIX NSDI ’26 ]

ZipLLM

Efficient LLM Storage via Model-Aware
Synergistic Data Deduplication and Compression

Zirui Wang · Tingfeng Lan · Zhaoyuan Su · Juncheng Yang · Yue Cheng
Artifacts Available Artifacts Functional Results Reproduced
54.1%
data reduction ratio
5.8 GB/s
ingestion throughput
7.7 GB/s
retrieval throughput
lossless
zero accuracy loss
Read Paper View Source
[ LIVE DEMO ]

Compression Live Demo

ZipNN (SOTA)
BitX (Ours)
0:00
0:00
[ PIPELINE ]

Multi-Level Deduplication & Compression

3,048 LLMs (43.19 TB) share massive redundancy at file, tensor, and bit levels. ZipLLM exploits all three, achieving a 54.1% data reduction ratio.

00 Input 43.19 TB
01 File Dedup xxHash per file · skip identical files 41.80 TB (-3.2%)
02 Tensor Dedup xxHash per tensor · skip identical tensors 39.61 TB (-8.3%)
03 BitX BitX for finetune pairs · ZipNN for solo tensors 19.82 TB (-54.1%)
[ BF16 EXPLORER ]

Interactive Bit Explorer

Adjust the delta to see how BF16 floating-point bits change.
Small weight updates flip very few bits — the foundation of BitX.

-1.0 0 +1.0
Original
Modified
XOR
0
bits flipped
0
exponent bits
0
mantissa bits
0
sign bit
BF16 layout: [S] 1-bit sign · [EEEEEEEE] 8-bit exponent · [MMMMMMM] 7-bit mantissa
[ BITX ]

Bit-Level XOR Transform

Fine-tuned models differ from their base by only a few bits per weight.
BitX exploits this by XOR-ing, splitting byte streams, then compressing.

Phase 1 — Two weight tensors (16-bit BF16 each)
Base model
Fine-tuned
[ ALGORITHMS ]

Core Algorithms

Pseudocode for the ZipLLM deduplication pipeline and the BitX compression kernel.

zipllm_pipeline.rs

    
bitx.rs