Hybrid Retrieval

What hybrid retrieval is

Hybrid retrieval fans out a single query across three signals in parallel and fuses their ranks:

Signal	Backend	Best for
Lexical	SQLite FTS5 + BM25	Exact API names, flags, config keys, error strings, code identifiers
Dense	Qdrant + FastEmbed `bge-base-en-v1.5`	Semantic neighbours ("login" matches "sign in")
Sparse	Qdrant + SPLADE	Splits the difference: term-aware but expansion-friendly

The retrieval dispatcher runs all three in a thread pool, fuses ranks with Reciprocal Rank Fusion, and returns the top-K sections.

Running a hybrid query

docmancer query "How do I authenticate?" --mode hybrid

--mode defaults to lexical for bare configs and auto-flips to hybrid when your YAML includes a non-empty vector_store: block.

You can also force a single signal:

docmancer query "OAuth scopes" --mode dense
docmancer query "OAuth scopes" --mode sparse
docmancer query "OAuth scopes" --mode lexical

Reciprocal Rank Fusion

RRF combines per-signal rankings without needing comparable scores. The formula per result r across signals s:

RRF(r) = sum( 1 / (k + rank_s(r)) )  for each signal s where r appears

k defaults to 60 (see retrieval.fusion.rrf_k). The default fusion method is plain rrf; switch to weighted_rrf in YAML to bias toward one signal.

Explain mode

docmancer query "auth setup" --mode hybrid --explain

Each result is annotated with the per-source rank that contributed:

[lexical#1, dense#3, sparse#2] Authentication > OAuth 2.0 > Scopes

Useful when:

A result you expected ranks lower than something else — --explain shows which signal placed each hit.
You're tuning a router rule and want to confirm the dispatcher (not the lexical fast-path) is being used.
One backend is misbehaving (e.g. Qdrant unavailable) and dense / sparse columns silently return empty.

When to skip vectors

Pure lexical queries bypass the dispatcher and read SQLite directly:

docmancer query "exact error message" --mode lexical

For ingests, skip the vector pass entirely:

docmancer ingest ./docs --no-vectors
DOCMANCER_AUTO_VECTORS=0 docmancer ingest ./docs

You still get a fully functional FTS5 index. Re-run ingest later without --no-vectors to backfill vectors.

Hierarchical retrieval

For large corpora, hybrid retrieval pairs with a two-stage pass:

Stage 1. Wide-net retrieval (candidate_pool = 200 by default) across all signals. Scores are aggregated per document_title_hash. Top-N documents (documents_limit = 5) survive.
Stage 2. Re-retrieve sections filtered to those documents. Up to sections_per_document (10) per surviving document. Fuse and return.

Auto-enabled per index once the corpus has at least retrieval.hierarchical.auto_min_documents (default 10) distinct documents. Forced on with retrieval.hierarchical.enabled: true, off with auto: false.

Router rules

Query routers narrow the dispatcher call before fusion. The first regex match merges declared filters (e.g. docset_root, sdk, international_classes) into the call. See Router recipes for concrete patterns.

Routers only fire under dispatcher modes (dense, sparse, hybrid). Pure lexical queries bypass the dispatcher and ignore routers.

Performance notes

Dense and sparse signals share the same FastEmbed model load, so a hybrid query is not 3x the cost of a lexical one.
A content-hash-keyed embeddings cache under ~/.docmancer/embeddings-cache/ skips re-embedding unchanged chunks on re-ingest.
Bulk upsert into Qdrant uses gRPC for throughput.
Re-running ingest without --recreate reuses cached embeddings for any section whose content hash matches.