[ USENIX NSDI ’26 ]
ZipLLM
Efficient LLM Storage via Model-Aware
Synergistic Data Deduplication and Compression
Zirui Wang · Tingfeng Lan · Zhaoyuan Su · Juncheng Yang · Yue Cheng
Artifacts Available
Artifacts Functional
Results Reproduced
54.1%
data reduction ratio
5.8 GB/s
ingestion throughput
7.7 GB/s
retrieval throughput
lossless
zero accuracy loss
[ LIVE DEMO ]
Compression Live Demo
[ PIPELINE ]
Multi-Level Deduplication & Compression
3,048 LLMs (43.19 TB) share massive redundancy at file, tensor, and bit levels. ZipLLM exploits all three, achieving a 54.1% data reduction ratio.
[ BF16 EXPLORER ]
Interactive Bit Explorer
Adjust the delta to see how BF16 floating-point bits change.
Small weight updates flip very few bits — the foundation of BitX.
BF16 layout: [S] 1-bit sign · [EEEEEEEE] 8-bit exponent · [MMMMMMM] 7-bit mantissa
[ BITX ]
Bit-Level XOR Transform
Fine-tuned models differ from their base by only a few bits per weight.
BitX exploits this by XOR-ing, splitting byte streams, then compressing.
Phase 1 — Two weight tensors (16-bit BF16 each)
↓ XOR (base ⊕ fine) ↓
Phase 2 — XOR result (yellow = bit flipped)
↓ Split byte streams ↓
Phase 3 — Separate into EXP and Sign+Mantissa streams
S+M stream (bit15 + bits 6..0)
↓ Zstd compress each ↓
Phase 4 — Independent compression
EXP → Zstd
Zstd compresses this stream independently
S+M → Zstd
Zstd compresses this stream independently
[ ALGORITHMS ]
Core Algorithms
Pseudocode for the ZipLLM deduplication pipeline and the BitX compression kernel.