[ USENIX NSDI ’26 ]

ZipLLM

Efficient LLM Storage via Model-Aware
Synergistic Data Deduplication and Compression

Zirui Wang · Tingfeng Lan · Zhaoyuan Su · Juncheng Yang · Yue Cheng

Artifacts Available Artifacts Functional Results Reproduced

54.1%

data reduction ratio

5.8 GB/s

ingestion throughput

7.7 GB/s

retrieval throughput

lossless

zero accuracy loss

Read Paper View Source

[ LIVE DEMO ]

Compression Live Demo

ZipNN (SOTA)

BitX (Ours)

0:00

[ PIPELINE ]

Multi-Level Deduplication & Compression

3,048 LLMs (43.19 TB) share massive redundancy at file, tensor, and bit levels. ZipLLM exploits all three, achieving a 54.1% data reduction ratio.

00 Input 43.19 TB

01 File Dedup xxHash per file · skip identical files 41.80 TB (-3.2%)

02 Tensor Dedup xxHash per tensor · skip identical tensors 39.61 TB (-8.3%)

03 BitX BitX for finetune pairs · ZipNN for solo tensors 19.82 TB (-54.1%)

[ BF16 EXPLORER ]

Interactive Bit Explorer

Adjust the delta to see how BF16 floating-point bits change.
Small weight updates flip very few bits — the foundation of BitX.

Base value

Delta: +0.000000

-1.0 0 +1.0

Original

Modified

XOR

0

bits flipped

0

exponent bits

0

mantissa bits

0

sign bit

BF16 layout: [S] 1-bit sign · [EEEEEEEE] 8-bit exponent · [MMMMMMM] 7-bit mantissa

[ BITX ]

Bit-Level XOR Transform

Fine-tuned models differ from their base by only a few bits per weight.
BitX exploits this by XOR-ing, splitting byte streams, then compressing.

Phase 1 — Two weight tensors (16-bit BF16 each)

Base model

Fine-tuned

[ ALGORITHMS ]

Core Algorithms

Pseudocode for the ZipLLM deduplication pipeline and the BitX compression kernel.

zipllm_pipeline.rs

bitx.rs