RAG

RAG chunk size guide: pick chunk size, overlap, and top-k without guessing

A practical guide to choosing RAG chunk size, overlap, retrieval top-k, and evaluation loops for technical docs, policies, support articles, PDFs, and knowledge bases.

Updated 2026-06-119 min readIntermediate

Open the RAG chunk calculator Read local GPU guide

AI Buyer Readiness Scorecard

Turn this guide into procurement, security, ROI, rollout, and governance questions.

Use the scorecard before opening vendor pricing pages. It keeps commercial AI research tied to the workflow, data risk, operating cost, and evidence buyers need before a shortlist becomes a purchase.

Procurement trigger

Define the business event behind the search: budget review, renewal, security review, failed pilot, new workflow, or vendor consolidation.

Data and security review

Check whether prompts, files, logs, embeddings, customer records, regulated data, or source code will touch the AI system.

ROI and operating cost

Estimate seat cost, API usage, implementation time, review effort, support load, fallback work, and expected workflow savings.

Integration and rollout path

Map the tools, identity systems, data sources, approval steps, change management, and users needed for a real deployment.

Governance evidence

Collect policies, evals, audit logs, human review rules, incident response, vendor terms, and owner names before procurement asks.

Best for

RAG builders tuning knowledge-base retrieval
Teams indexing PDFs, docs, policies, and support content
Readers comparing LangChain, LlamaIndex, and custom pipelines
Local LLM users who need better retrieval with smaller context windows

Not for

One perfect chunk size that works for every corpus
A replacement for retrieval evaluation and human review
Vector database vendor selection

Comparison

Choose by workflow, not brand

Option	Best for	Strengths	Tradeoffs	Use when
Small chunks	Precise facts, FAQ pages, API references, glossary entries, and short support answers	Improves precision and makes retrieved evidence easier to inspect.	Can lose surrounding context and require a higher top-k.	Questions usually target narrow facts or short procedures.
Medium chunks	Most technical docs, blog posts, tutorials, and product knowledge bases	Balances precision, context, and embedding cost.	Still needs structure-aware splitting to avoid breaking tables or code blocks.	You need a durable default before corpus-specific tuning.
Large chunks	Narrative documents, contracts, long policies, and content where context spans paragraphs	Preserves more surrounding context per retrieved item.	Can reduce precision and fill the context window quickly.	Answers require broader context, caveats, or multi-paragraph interpretation.

Start from document structure

Good chunking respects headings, paragraphs, code blocks, tables, and semantic sections. A fixed character split is easy, but it often breaks the exact evidence your answer needs.

Split by headings first, then paragraphs or sentences inside long sections.
Keep tables, code blocks, and numbered procedures intact when possible.
Store metadata such as title, heading path, URL, date, and document type.

Tune chunk size with real questions

A chunking strategy is only good if the retriever finds the right evidence. Build a small set of real user questions, expected evidence snippets, and unacceptable answers. Then change one variable at a time.

Measure whether the correct source appears in top-k results.
Inspect whether retrieved chunks contain enough context to answer safely.
Use reranking when top-k recall is good but final evidence quality is noisy.

Overlap is not a magic fix

Overlap helps when meaning crosses boundaries, but too much overlap increases index size, duplicates evidence, and can crowd out diverse sources. Use it deliberately.

Use modest overlap for narrative text and procedures.
Use less overlap for short FAQ or reference entries.
If overlap is high because chunks are too small, test medium chunks instead.

Decision Rules

A practical checklist

For technical docs, start medium and split by headings before token count.

For support FAQs, use smaller chunks and higher top-k.

For legal or policy documents, test larger chunks plus careful citations.

If answers hallucinate, inspect retrieved evidence before changing the model.

Related Guides

Continue the decision path

Open the RAG chunk calculator

Estimate chunk size, overlap, and retrieval count for your document type.

Open

Read local GPU guide

Plan local context and model memory for private RAG workflows.

Open

RAG chunk calculator

Use the interactive calculator to choose a starting configuration.

Open

Local LLM GPU calculator

Estimate local model memory when building private RAG systems.

Open

Ollama vs LM Studio

Choose a local runtime for private retrieval workflows.

Open

Chinese Archive

Aligned deeper reading

Dify and knowledge-base workflows

Chinese archive for RAG, knowledge bases, and AI workflow building.

Open

AI agent archive

Chinese notes on agents, tools, and retrieval-assisted workflows.

Open

Topic Hubs

Explore the wider search cluster

Topic hub

RAG and models

Plan RAG systems, local LLM deployment, model APIs, cloud AI platforms, vector databases, evaluation, observability, rate limits, and cost optimization.

Open

Industry Pages

See this guide in a buyer workflow

Industry page

Data analytics AI

Compare AI tools for data analysis, business intelligence, data governance, customer data platforms, knowledge management, RAG, analytics workflows, and trusted decision support.

Open

FAQ

Common questions

What is the best chunk size for RAG?

There is no universal best. Start with structure-aware medium chunks, then evaluate retrieval with real questions and adjust based on precision, context, and answer quality.

How much overlap should RAG chunks use?

Use enough overlap to preserve meaning across boundaries, but avoid excessive duplication. Many systems start with modest overlap and tune after inspecting retrieval results.

Should I change chunk size or the model first?

Inspect retrieval evidence first. If the right evidence is missing, tune chunking, metadata, top-k, or reranking before changing the answer model.

Source Links

Primary references used for this guide

Reference

LangChain text splitters

LangChain guidance on splitting large documents into retrievable chunks.

Open

Reference

LlamaIndex chunk sizes

LlamaIndex notes on chunk size, overlap, and retrieval tuning.

Open

Reference

RAG chunk calculator

The interactive zglg.work calculator for chunk size and overlap.

Open

Build your own evaluation note

The strongest decision is always local to your workflow. Save the vendor links, define a representative task, record the exact prompt or command, and compare the final evidence instead of the marketing claim.

Return to the AI learning map