Guozhen AIGlobal AI field notes and model intelligence
Back to AI Tools Workbench

AI Calculator

Local LLM GPU fit checker

Estimate whether a model can fit your local GPU memory with quantization, context, and CPU offload.

Best for: Ollama, vLLM, private knowledge bases, and local model deployment

This calculator is still a rough fit check, but it now separates model weights, runtime overhead, KV cache, and CPU offload.

Estimated total
9.2 GB
Weights
8.1 GB
KV cache
0.4 GB
Usable GPU
21.6 GB

Fits in GPU memory

At 4-bit / Q4, this 14B model needs about 9.2 GB.

For real deployment, leave extra headroom for long prompts, concurrency, and framework differences.