AI Calculator
Local LLM GPU fit checker
Estimate whether a model can fit your local GPU memory with quantization, context, and CPU offload.
Best for: Ollama, vLLM, private knowledge bases, and local model deployment
This calculator is still a rough fit check, but it now separates model weights, runtime overhead, KV cache, and CPU offload.
Estimated total
9.2 GB
Weights
8.1 GB
KV cache
0.4 GB
Usable GPU
21.6 GB
Fits in GPU memory
At 4-bit / Q4, this 14B model needs about 9.2 GB.
For real deployment, leave extra headroom for long prompts, concurrency, and framework differences.