usage

Hybrid Retrieval

How lexical, dense, and sparse signals are fused with Reciprocal Rank Fusion.

Updated

What hybrid retrieval is

Hybrid retrieval fans out a single query across three signals in parallel and fuses their ranks:

SignalBackendBest for
LexicalSQLite FTS5 + BM25Exact API names, flags, config keys, error strings, code identifiers
DenseQdrant + FastEmbed bge-base-en-v1.5Semantic neighbours ("login" matches "sign in")
SparseQdrant + SPLADESplits the difference: term-aware but expansion-friendly

The retrieval dispatcher runs all three in a thread pool, fuses ranks with Reciprocal Rank Fusion, and returns the top-K sections.

Running a hybrid query

docmancer query "How do I authenticate?" --mode hybrid

--mode defaults to lexical for bare configs and auto-flips to hybrid when your YAML includes a non-empty vector_store: block.

You can also force a single signal:

docmancer query "OAuth scopes" --mode dense
docmancer query "OAuth scopes" --mode sparse
docmancer query "OAuth scopes" --mode lexical

Reciprocal Rank Fusion

RRF combines per-signal rankings without needing comparable scores. The formula per result r across signals s:

RRF(r) = sum( 1 / (k + rank_s(r)) )  for each signal s where r appears

k defaults to 60 (see retrieval.fusion.rrf_k). The default fusion method is plain rrf; switch to weighted_rrf in YAML to bias toward one signal.

Explain mode

docmancer query "auth setup" --mode hybrid --explain

Each result is annotated with the per-source rank that contributed:

[lexical#1, dense#3, sparse#2] Authentication > OAuth 2.0 > Scopes

Useful when:

  • A result you expected ranks lower than something else — --explain shows which signal placed each hit.
  • You're tuning a router rule and want to confirm the dispatcher (not the lexical fast-path) is being used.
  • One backend is misbehaving (e.g. Qdrant unavailable) and dense / sparse columns silently return empty.

When to skip vectors

Pure lexical queries bypass the dispatcher and read SQLite directly:

docmancer query "exact error message" --mode lexical

For ingests, skip the vector pass entirely:

docmancer ingest ./docs --no-vectors
DOCMANCER_AUTO_VECTORS=0 docmancer ingest ./docs

You still get a fully functional FTS5 index. Re-run ingest later without --no-vectors to backfill vectors.

Hierarchical retrieval

For large corpora, hybrid retrieval pairs with a two-stage pass:

  1. Stage 1. Wide-net retrieval (candidate_pool = 200 by default) across all signals. Scores are aggregated per document_title_hash. Top-N documents (documents_limit = 5) survive.
  2. Stage 2. Re-retrieve sections filtered to those documents. Up to sections_per_document (10) per surviving document. Fuse and return.

Auto-enabled per index once the corpus has at least retrieval.hierarchical.auto_min_documents (default 10) distinct documents. Forced on with retrieval.hierarchical.enabled: true, off with auto: false.

Router rules

Query routers narrow the dispatcher call before fusion. The first regex match merges declared filters (e.g. docset_root, sdk, international_classes) into the call. See Router recipes for concrete patterns.

Routers only fire under dispatcher modes (dense, sparse, hybrid). Pure lexical queries bypass the dispatcher and ignore routers.

Performance notes

  • Dense and sparse signals share the same FastEmbed model load, so a hybrid query is not 3x the cost of a lexical one.
  • A content-hash-keyed embeddings cache under ~/.docmancer/embeddings-cache/ skips re-embedding unchanged chunks on re-ingest.
  • Bulk upsert into Qdrant uses gRPC for throughput.
  • Re-running ingest without --recreate reuses cached embeddings for any section whose content hash matches.