Start from document structure
Good chunking respects headings, paragraphs, code blocks, tables, and semantic sections. A fixed character split is easy, but it often breaks the exact evidence your answer needs.
- Split by headings first, then paragraphs or sentences inside long sections.
- Keep tables, code blocks, and numbered procedures intact when possible.
- Store metadata such as title, heading path, URL, date, and document type.