LLM-GPU Sizing Calculator
Estimate VRAM requirements for LLM inference workloads
v1.0
Model configuration
Model preset
Layers i
KV heads i
Head dim i
Params (B) i

Weight precision i
KV cache precision i

Concurrent users i
5
Context window (tokens) i
128K
Avg context utilization i
60%
Framework overhead (GB) i
Memory breakdown
Avg VRAM required
at avg utilization
Worst-case VRAM
all users at full context
Model weights
KV cache (avg utilization)
KV cache (worst case)
Serving framework overhead
Total (avg)
Click a GPU row below to see headroom
KV cache math — step by step
Adjust inputs above to see the calculation.
GPU recommendation matrix
Click any GPU row to show headroom in the breakdown panel above. Fit rating is based on average VRAM required.
GPU VRAM Mem BW MIG NVLink 16×