usage

Adding Documentation

Fetch and index docs from URLs, GitHub repos, and local files.

Updated

Two ingest commands

CommandUse it for
docmancer add <url>URLs and GitHub repos
docmancer ingest <path>Local directories or files

Both commands write into the same hybrid index (SQLite FTS5 + Qdrant dense + Qdrant sparse).

Add from a URL

docmancer add https://docs.example.com

Docmancer auto-detects the docs platform and chooses the best fetching strategy:

PlatformDetectionStrategy
GitBookllms-full.txt endpointFull-text download
Mintlifyllms.txt or sitemap.xmlSitemap crawl
GitHubRepository URLREADME + docs directory extraction
Generic websitemap.xml or nav crawlPage-by-page fetch

Force a specific provider:

docmancer add https://docs.example.com --provider mintlify

Add from a GitHub repo

docmancer add https://github.com/owner/repo

Extracts the README and any docs/ directory content.

Ingest local files

docmancer ingest ./my-internal-docs

Supported file formats:

FormatExtra needed
Markdown (.md, .mdx)none
Plain text (.txt)none
PDF (.pdf)docmancer[local]
DOCX (.docx)docmancer[local]
RTF (.rtf)docmancer[local]
HTML (.html, .htm)docmancer[local]

Install the parsers when you need them:

pip install 'docmancer[local]'

Options

docmancer add flags:

FlagDefaultDescription
--providerautoauto, gitbook, mintlify, web, github, crawl4ai
--strategyautoForce discovery strategy
--max-pages500Limit pages fetched
--browseroffUse Playwright for JS-heavy sites (needs docmancer[browser])
--fetch-workersautoNumber of concurrent page fetch workers
--recreateoffDrop and rebuild the index for this source

docmancer ingest flags:

FlagDefaultDescription
--no-vectorsoffSkip Qdrant; index lexical-only (FTS5)
--recreateoffDrop and rebuild the index for this source
--workersautoConcurrent parsing workers

Update existing sources

Re-fetch and re-index when upstream docs change:

docmancer update
docmancer update https://docs.example.com

Updates reuse the content-hash-keyed embeddings cache, so unchanged sections skip re-embedding.