Guozhen AIGlobal AI field notes and model intelligence
Back to AI decision guides

RAG retrieval

Hybrid search for RAG: combine BM25, embeddings, and reranking

A production guide to hybrid search for RAG: when to combine keyword BM25 and vector embeddings, how to fuse rankings, when to add rerankers, and how to evaluate retrieval.

Updated 2026-06-118 min readIntermediate

Best for

  • RAG teams with poor retrieval on exact terms, IDs, and short queries
  • Developers choosing between BM25, dense embeddings, sparse vectors, and rerankers
  • Enterprise search teams modernizing keyword search with semantic retrieval
  • Product teams trying to improve answer quality without changing models

Not for

  • Fixing bad source documents or missing metadata
  • Assuming more retrieval methods always improve answer quality
  • Skipping query logs and evaluation when tuning weights

Comparison

Choose by workflow, not brand

OptionBest forStrengthsTradeoffsUse when
Pure vector searchSemantic similarity, paraphrases, natural-language questions, and fuzzy meaning matchesFinds conceptually similar content even when words differ.Can miss exact identifiers, rare names, product codes, and keyword-heavy queries.Most questions are semantic and documents are well chunked.
BM25 or keyword searchExact terms, IDs, product names, error codes, titles, and compliance languageStrong precision for exact lexical matches and rare terms.Misses paraphrases and questions that use different wording from the document.Users often search with exact identifiers or short keyword queries.
Hybrid search plus rerankingProduction RAG where recall and precision both matterCombines lexical precision, semantic recall, and reranker ordering.Adds latency, tuning, infrastructure complexity, and evaluation burden.Bad retrieval is a visible product quality problem.

Hybrid search fixes common RAG misses

Pure vector search can be impressive but brittle. It may understand broad meaning while missing the exact SKU, regulation number, function name, or log code the user typed.

  • Use lexical search for exact identifiers and rare terms.
  • Use dense embeddings for meaning and paraphrases.
  • Use metadata filters before ranking when permissions or scope are known.

Fusion strategy matters

Hybrid search needs a way to combine result lists. Common strategies include score normalization, reciprocal rank fusion, weighted blending, or vendor-specific ranking pipelines.

  • Tune weights on query logs, not intuition.
  • Separate exact-match queries from broad semantic questions where possible.
  • Watch for duplicated chunks and near-duplicates after merging.

Rerank only where it pays

Rerankers can improve precision, but they add cost and latency. Use them on high-value workflows, larger candidate sets, or cases where the final answer depends heavily on source ordering.

  • Track top-k recall before and after hybrid retrieval.
  • Measure answer quality, not only retrieval score.
  • Cache or batch reranking when query volume justifies it.

Decision Rules

A practical checklist

01

Use pure vector search for semantic Q&A with low exact-match pressure.

02

Add BM25 or sparse retrieval when users search names, codes, IDs, and titles.

03

Add reranking when retrieved candidates are relevant but poorly ordered.

04

Tune hybrid weights with evals built from real query logs.

Related Guides

Continue the decision path

Chinese Archive

Aligned deeper reading

Topic Hubs

Explore the wider search cluster

Industry Pages

See this guide in a buyer workflow

FAQ

Common questions

Is hybrid search better than vector search?

It is often better when exact terms matter. Pure vector search can miss IDs, names, codes, and short keyword queries that lexical search handles well.

Do I need a reranker after hybrid search?

Not always. Add reranking when first-stage retrieval finds relevant candidates but ordering still causes bad answers or missed citations.

How do I tune hybrid search weights?

Use real query logs and labeled retrieval tests. Tune separately for exact-match, semantic, and mixed queries rather than using one guess for all traffic.

Source Links

Primary references used for this guide

Build your own evaluation note

The strongest decision is always local to your workflow. Save the vendor links, define a representative task, record the exact prompt or command, and compare the final evidence instead of the marketing claim.

Return to the AI learning map