AI Calculator

Local open LLM compatibility checker

Enter RAM, VRAM, quantization, and use case to see which common open-weight LLMs can run locally.

Best for: Choosing between Llama, Qwen, DeepSeek, Gemma, Phi, Mistral, Yi, GLM, and code models

Calculator Input Checklist

Gather traffic, token, retry, privacy, and pricing assumptions before trusting the estimate.

Small prototype numbers often undercount production cost. Use the checklist before comparing plans, setting a monthly budget, or choosing an AI software vendor.

Real traffic pattern

Use expected users, requests per user, peak hours, batch jobs, background tasks, and seasonal growth instead of a single demo call.

Prompt and output mix

Estimate input tokens, output tokens, context windows, attachments, retrieved chunks, and system prompts separately.

Retries, fallbacks, and evaluations

Include failed calls, retries, safety checks, quality evaluations, cache misses, and fallback models before setting a budget.

Privacy and retention constraints

Check whether the workflow can send prompts, files, logs, embeddings, or customer data to the model provider.

Fresh vendor pricing

Treat the calculator as a planning layer, then verify live pricing, quotas, terms, and region availability on vendor pages.

Model names and parameter sizes come from official model cards or project docs. Memory fit is estimated from parameter count, quantization, context, and runtime overhead; it is not a vendor guarantee.

Computer typeGPU VRAM GBSystem / unified memory GBQuantizationTarget context K tokens8 means about 8,000 tokens.Use case

Allow CPU/RAM offload when GPU memory is not enough

Catalog models

common open-weight variants

Runnable

Best current family

Qwen3

Quantization

4-bit / Q4

Recommended shortlist

Qwen3 30B-A3B MoE

19.3 GB

GPU fit

Qwen3 4B

3.0 GB

GPU fit

Gemma 3 4B

3.0 GB

GPU fit

Phi-4-mini 3.8B

2.9 GB

GPU fit

Llama 3.2 3B Instruct

2.4 GB

GPU fit

Model	Params	Context	Estimated memory	Status	Best for	Source
Qwen3 30B-A3B MoE MoE 模型，加载看总参数，速度更接近激活参数。	30B / A3B	256K	19.3 GB W 17.4 / KV 0.3	GPU fit	ChatReasoningChinese	Qwen3 GitHub Apache 2.0 / model card terms
Qwen3 4B 低显存中文和推理入门档。	4B	32K	3.0 GB W 2.3 / KV 0.3	GPU fit	ChatReasoningChinese	Qwen3 GitHub Apache 2.0 / model card terms
Gemma 3 4B 4B 以上 Gemma 3 支持多模态能力，文本使用更省资源。	4B	128K	3.0 GB W 2.3 / KV 0.3	GPU fit	ChatVisionEdge	Google Gemma 3 model card Gemma Terms of Use
Phi-4-mini 3.8B 小尺寸推理模型，适合低延迟和学习场景。	3.8B	128K	2.9 GB W 2.2 / KV 0.3	GPU fit	ReasoningEdge	Microsoft Phi-4 models MIT
Llama 3.2 3B Instruct 低门槛本地聊天模型，适合轻量助手。	3B	128K	2.4 GB W 1.7 / KV 0.3	GPU fit	ChatEdge	Meta Llama 3.2 Llama 3.2 Community License
Qwen2.5 3B Instruct 低显存下比 1B 档更稳。	3B	32K	2.4 GB W 1.7 / KV 0.3	GPU fit	ChatChinese	Qwen2.5 LLM Apache 2.0
Qwen2.5-Coder 3B 3B 代码模型，使用前注意许可证。	3B	32K	2.4 GB W 1.7 / KV 0.3	GPU fit	Coding	Qwen2.5-Coder Qwen-Research
StarCoder2 3B 小型代码生成模型。	3B	16K	2.4 GB W 1.7 / KV 0.3	GPU fit	CodingEdge	StarCoder2 paper OpenRAIL-M
InternLM2.5 1.8B Chat 轻量中文开放模型。	1.8B	32K	1.7 GB W 1.0 / KV 0.3	GPU fit	ChatChineseEdge	InternLM GitHub Apache 2.0 / model card terms
Qwen3 1.7B 适合轻量中文问答和简单任务。	1.7B	32K	1.7 GB W 1.0 / KV 0.3	GPU fit	ChatChineseEdge	Qwen3 GitHub Apache 2.0 / model card terms
Qwen2.5 1.5B Instruct 适合低配置中文助手和简单摘要。	1.5B	32K	1.6 GB W 0.9 / KV 0.3	GPU fit	ChatChineseEdge	Qwen2.5 LLM Apache 2.0
Qwen2.5-Coder 1.5B 轻量代码生成和解释。	1.5B	32K	1.6 GB W 0.9 / KV 0.3	GPU fit	CodingEdge	Qwen2.5-Coder Apache 2.0 / Qwen-Research for 3B
DeepSeek-R1-Distill-Qwen 1.5B 小型蒸馏推理模型，适合低配置体验思维链风格。	1.5B	128K	1.6 GB W 0.9 / KV 0.3	GPU fit	ReasoningChineseEdge	DeepSeek-R1 GitHub MIT / base model terms
Llama 3.2 1B Instruct 轻量文本模型，适合低内存设备和快速问答。	1B	128K	1.3 GB W 0.6 / KV 0.3	GPU fit	ChatEdge	Meta Llama 3.2 Llama 3.2 Community License
Gemma 3 1B Google 小尺寸开放权重模型，适合轻量任务。	1B	32K	1.3 GB W 0.6 / KV 0.3	GPU fit	ChatEdge	Google Gemma 3 model card Gemma Terms of Use
Qwen3 0.6B 超小中文友好模型，适合低配置设备尝试。	0.6B	32K	1.0 GB W 0.3 / KV 0.3	GPU fit	ChatChineseEdge	Qwen3 GitHub Apache 2.0 / model card terms
Qwen2.5 0.5B Instruct 极低门槛中文轻量模型。	0.5B	32K	1.0 GB W 0.3 / KV 0.3	GPU fit	ChatChineseEdge	Qwen2.5 LLM Apache 2.0
Qwen2.5-Coder 0.5B 极小代码模型，适合低成本补全实验。	0.5B	32K	1.0 GB W 0.3 / KV 0.3	GPU fit	CodingEdge	Qwen2.5-Coder Apache 2.0 / Qwen-Research for 3B
Llama 3.1 8B Instruct 经典 8B 档通用模型，生态和量化版本丰富。	8B	128K	5.3 GB W 4.6 / KV 0.3	GPU fit	ChatRAG	Meta Llama 3.1 Llama 3.1 Community License
Qwen3 8B 常见单卡本地中文模型选择。	8B	32K	5.3 GB W 4.6 / KV 0.3	GPU fit	ChatReasoningChinese	Qwen3 GitHub Apache 2.0 / model card terms
DeepSeek-R1-Distill-Llama 8B 基于 Llama 的 8B 蒸馏推理模型。	8B	128K	5.3 GB W 4.6 / KV 0.3	GPU fit	Reasoning	DeepSeek-R1 GitHub MIT / base model terms
Qwen2.5 7B Instruct 中文本地部署常用 7B 档。	7B	128K	4.8 GB W 4.1 / KV 0.3	GPU fit	ChatChineseRAG	Qwen2.5 LLM Apache 2.0
Qwen2.5-Coder 7B 常见代码本地模型，低门槛实用。	7B	128K	4.8 GB W 4.1 / KV 0.3	GPU fit	CodingChinese	Qwen2.5-Coder Apache 2.0
DeepSeek-R1-Distill-Qwen 7B 常见本地推理入门模型。	7B	128K	4.8 GB W 4.1 / KV 0.3	GPU fit	ReasoningChinese	DeepSeek-R1 GitHub MIT / base model terms
Mistral 7B Instruct 经典 7B 开放模型，生态成熟。	7B	32K	4.8 GB W 4.1 / KV 0.3	GPU fit	ChatCoding	Mistral 7B docs Apache 2.0
Code Llama 7B Instruct 经典代码模型，适合兼容旧工具链。	7B	16K	4.8 GB W 4.1 / KV 0.3	GPU fit	Coding	Code Llama paper Llama 2 Community License
StarCoder2 7B 代码补全和生成常见选择。	7B	16K	4.8 GB W 4.1 / KV 0.3	GPU fit	Coding	StarCoder2 paper OpenRAIL-M
InternLM2.5 7B Chat 常见中文 7B 档模型。	7B	32K	4.8 GB W 4.1 / KV 0.3	GPU fit	ChatChinese	InternLM GitHub Apache 2.0 / model card terms
Yi-1.5 6B Chat Yi-1.5 小尺寸模型，中文用户常见。	6B	32K	4.2 GB W 3.5 / KV 0.3	GPU fit	ChatChinese	Yi-1.5 GitHub Yi License / model card terms
Qwen3 14B 中档质量和本地成本较平衡。	14B	32K	9.2 GB W 8.1 / KV 0.4	GPU fit	ChatReasoningChinese	Qwen3 GitHub Apache 2.0 / model card terms
Qwen2.5 14B Instruct 中文质量和成本均衡，适合 16GB 以上显存优先试。	14B	128K	9.2 GB W 8.1 / KV 0.4	GPU fit	ChatChineseRAG	Qwen2.5 LLM Apache 2.0
Qwen2.5-Coder 14B 代码生成、解释和改错的中档选择。	14B	128K	9.2 GB W 8.1 / KV 0.4	GPU fit	CodingChinese	Qwen2.5-Coder Apache 2.0
DeepSeek-R1-Distill-Qwen 14B 推理质量比小模型更稳，适合 16GB 以上显存尝试。	14B	128K	9.2 GB W 8.1 / KV 0.4	GPU fit	ReasoningChinese	DeepSeek-R1 GitHub MIT / base model terms
Phi-4 14B 14B 小模型家族里常见的数学/推理选择。	14B	16K	9.2 GB W 8.1 / KV 0.4	GPU fit	ReasoningCoding	Microsoft Phi-4 model card MIT
Code Llama 13B Instruct 13B 代码模型，已有大量量化版本。	13B	16K	8.6 GB W 7.5 / KV 0.3	GPU fit	Coding	Code Llama paper Llama 2 Community License
Gemma 3 12B 中档 Gemma 3，适合视觉/文本混合任务尝试。	12B	128K	7.9 GB W 7.0 / KV 0.3	GPU fit	ChatVision	Google Gemma 3 model card Gemma Terms of Use
Mistral NeMo 12B 12B 多语言开放模型，长上下文友好。	12B	128K	7.9 GB W 7.0 / KV 0.3	GPU fit	ChatRAG	Mistral NeMo Apache 2.0
GLM-4 9B Chat 中文生态常见 9B 模型，有长上下文变体。	9B	128K	6.0 GB W 5.2 / KV 0.3	GPU fit	ChatChineseRAG	GLM Transformers docs GLM license / model card terms
Yi-1.5 9B Chat 9B 中文/英文通用模型。	9B	32K	6.0 GB W 5.2 / KV 0.3	GPU fit	ChatChinese	Yi-1.5 GitHub Yi License / model card terms
Qwen3 32B 高质量单机/工作站常见选择，显存要求明显上升。	32B	32K	21.1 GB W 18.6 / KV 0.8	GPU fit	ChatReasoningChineseCoding	Qwen3 GitHub Apache 2.0 / model card terms
Qwen2.5 32B Instruct 32B 档通用能力强，适合工作站。	32B	128K	21.1 GB W 18.6 / KV 0.8	GPU fit	ChatChineseCodingRAG	Qwen2.5 LLM Apache 2.0
Qwen2.5-Coder 32B 常见高质量本地代码模型，需要较强显存。	32B	128K	21.1 GB W 18.6 / KV 0.8	GPU fit	CodingChinese	Qwen2.5-Coder Apache 2.0
DeepSeek-R1-Distill-Qwen 32B 本地推理常见高质量档，需要工作站显存。	32B	128K	21.1 GB W 18.6 / KV 0.8	GPU fit	ReasoningChineseCoding	DeepSeek-R1 GitHub MIT / base model terms
Gemma 3 27B Gemma 3 高质量档，24GB 显存需看量化和上下文。	27B	128K	17.8 GB W 15.7 / KV 0.7	GPU fit	ChatVisionReasoning	Google Gemma 3 model card Gemma Terms of Use
Mistral Small 3.1 24B 24B 开放模型，适合中高端单机尝试。	24B	128K	15.8 GB W 13.9 / KV 0.6	GPU fit	ChatVisionRAG	Mistral Small 3.1 Apache 2.0
Devstral Small 24B 面向代码库探索和软件工程 Agent 的 24B 模型。	24B	128K	15.8 GB W 13.9 / KV 0.6	GPU fit	Coding	Devstral Small docs Apache 2.0
InternLM2.5 20B Chat 20B 中文模型，适合中高端本地机器。	20B	32K	13.2 GB W 11.6 / KV 0.5	GPU fit	ChatChinese	InternLM GitHub Apache 2.0 / model card terms
StarCoder2 15B StarCoder2 最大公开尺寸，适合代码任务。	15B	16K	9.9 GB W 8.7 / KV 0.4	GPU fit	Coding	StarCoder2 paper OpenRAIL-M
Mixtral 8x7B MoE 模型，加载按总参数估算，推理速度看激活参数。	47B / A13B	32K	30.2 GB W 27.3 / KV 0.3	Offload needed	ChatCoding	Mixtral 8x7B docs Apache 2.0
Qwen2.5 72B Instruct 高质量大模型，普通单卡通常需要强量化或 offload。	72B	128K	47.5 GB W 41.8 / KV 1.8	Offload needed	ChatChineseRAG	Qwen2.5 LLM Qwen license / model card terms
Llama 3.1 70B Instruct 高质量通用模型，通常需要大显存或多卡/CPU offload。	70B	128K	46.2 GB W 40.6 / KV 1.8	Offload needed	ChatRAG	Meta Llama 3.1 Llama 3.1 Community License
Llama 3.3 70B Instruct 70B 档通用模型，适合高质量聊天和推理。	70B	128K	46.2 GB W 40.6 / KV 1.8	Offload needed	ChatReasoning	Meta Llama 3.3 model card Llama 3.3 Community License
DeepSeek-R1-Distill-Llama 70B 大尺寸推理蒸馏模型，通常需要大显存或多卡。	70B	128K	46.2 GB W 40.6 / KV 1.8	Offload needed	Reasoning	DeepSeek-R1 GitHub MIT / base model terms
Code Llama 70B Instruct 70B 代码模型，普通个人电脑不推荐。	70B	16K	46.2 GB W 40.6 / KV 1.8	Offload needed	Coding	Code Llama paper Llama 2 Community License
Code Llama 34B Instruct 较大代码模型，适合高显存环境。	34B	16K	22.4 GB W 19.7 / KV 0.9	Offload needed	Coding	Code Llama paper Llama 2 Community License
Yi-1.5 34B Chat Yi-1.5 大尺寸模型，需要较高显存。	34B	32K	22.4 GB W 19.7 / KV 0.9	Offload needed	ChatChinese	Yi-1.5 GitHub Yi License / model card terms
DeepSeek-R1 671B MoE 完整 R1 需要加载 671B 总参数，个人电脑通常不适合。	671B / A37B	128K	427.0 GB W 389.2 / KV 0.9	Not recommended	ReasoningChineseCoding	DeepSeek-R1 GitHub MIT / model card terms
Llama 3.1 405B Instruct 旗舰级开放权重模型，普通个人电脑不适合本地加载。	405B	128K	267.3 GB W 234.9 / KV 10.1	Not recommended	ChatReasoning	Meta Llama 3.1 Llama 3.1 Community License
Qwen3 235B-A22B MoE 旗舰 MoE，通常属于服务器或多卡工作站范围。	235B / A22B	256K	149.8 GB W 136.3 / KV 0.6	Not recommended	ChatReasoningChinese	Qwen3 GitHub Apache 2.0 / model card terms
Mixtral 8x22B 大型 MoE，通常需要服务器级内存/显存。	141B / A39B	64K	90.5 GB W 81.8 / KV 1.0	Not recommended	ChatCoding	Mistral Mixtral 8x22B Apache 2.0

How to read this result

MoE models load total parameters even if only part of them are active per token. Long context increases KV cache. CPU offload can make a model load, but generation speed may be much slower.

From Calculator to Buying Decision

Turn this calculator result into AI software, API, benchmark, RAG, and gateway decisions.

AI Cost Guides

Turn calculator output into durable budget models for AI software cost, implementation cost, RAG cost, agent cost, chatbot cost, and document automation cost.

Plan cost

AI ROI Guides

Turn calculator output into ROI, payback, automation savings, chatbot savings, agent ROI, and AI business case approval.

Prove ROI

AI Services Buyer Guides

Use calculator output to evaluate AI consultants, implementation partners, automation agencies, integration services, and enterprise AI advisors.

Hire services

AI Governance Guides

Use calculator output to plan governance, risk assessment, vendor risk, model risk, compliance automation, and policy controls.

Control risk

AI Software Buyer Guides

Use the calculator output as the next input for software category comparisons across finance, insurance, banking, support, operations, and enterprise teams.

Compare software

AI Buying Templates

Turn calculator results into RFP language, vendor scorecards, security questionnaires, POC plans, business cases, governance policies, and procurement checklists.

Use templates

AI Model Benchmarks

Check model quality, latency, coding ability, multimodal behavior, and cost tradeoffs before turning estimates into a shortlist.

Review benchmarks

OpenAI vs Anthropic API

Connect calculator assumptions to API platform decisions around reliability, pricing, latency, governance, and developer workflow.

Compare APIs

AI API Cost Calculator Guide

Turn rough usage estimates into a practical cost model for prompts, users, retries, evaluations, batch jobs, and budget controls.

Model cost

RAG Chunk Size Guide

Use retrieval-specific guidance when calculator results point toward knowledge bases, support docs, enterprise search, or document QA.

Plan RAG

LLM Gateway Comparison

Move from single-calculator estimates into routing, fallbacks, budgets, observability, and provider control for production AI systems.

Compare gateways

Calculator FAQ

Use calculator results as buyer research, not a final quote

How should I use this calculator before choosing an AI tool?

Use it to create a first estimate, then compare actual vendor pricing, model benchmarks, privacy requirements, integration effort, and workflow tests before committing budget.

Is the calculator result an exact quote?

No. It is a planning estimate. Production cost and fit can change with prompts, context length, retries, batch jobs, traffic, data quality, and provider pricing changes.

What should I read after using Local open LLM compatibility checker?

Open AI Software Buyer Guides, AI Model Benchmarks, OpenAI vs Anthropic API, RAG Chunk Size Guide, or LLM Gateway Comparison depending on the decision you need to make.

When should a team re-run this calculator?

Re-run it after model changes, pricing changes, prompt changes, traffic growth, data-volume changes, new security requirements, or a shift from prototype to production use.