LLM-GPU Sizing Calculator

Model configuration

Model preset

Layers

KV heads

Head dim

Params (B)

Weight precision

KV cache precision

Concurrent users

Context window (tokens)

128K

Avg context utilization

60%

Framework overhead (GB)

Memory breakdown

Avg VRAM required

—

at avg utilization

Worst-case VRAM

—

all users at full context

Model weights—

KV cache (avg utilization)—

KV cache (worst case)—

Serving framework overhead—

Total (avg)—

Click a GPU row below to see headroom

KV cache math — step by step

Adjust inputs above to see the calculation.

GPU recommendation matrix

Click any GPU row to show headroom in the breakdown panel above. Fit rating is based on average VRAM required.

GPU	VRAM	Mem BW	MIG	NVLink	1×	2×	4×	6×	8×	16×