Search Features

Semantic search, filesystem exploration, and the commands that bridge them.

How Semantic Search Works

Pathfinder uses a three-stage pipeline to answer conceptual questions about your docs:

  1. Indexing — Your docs are chunked (split into segments based on headings, functions, or token counts) and each chunk is converted into a high-dimensional vector using OpenAI's embedding model.
  2. Storage — Vectors are stored in PostgreSQL using the pgvector extension, which provides efficient approximate nearest-neighbor search.
  3. Retrieval — When an agent searches, the query is embedded using the same model, then compared against stored vectors using cosine similarity. The most similar chunks are returned, ranked by score (0-1).

This means agents can ask "how do I handle authentication?" and find the relevant docs even if they never contain the word "authentication" — because the meaning is captured in the vector representation.

qmd

When grep_strategy is set to vector or hybrid in your bash tool config, Pathfinder exposes a qmd command inside the bash sandbox. This is a semantic search command that agents can use alongside regular shell tools.

$ qmd "how do I handle errors" /docs/guides/error-handling.mdx (0.94) /docs/reference/hooks.mdx (0.87) /docs/guides/streaming.mdx (0.81)

qmd returns file paths ranked by semantic similarity, with scores in parentheses. Higher scores mean stronger matches.

qmd vs grep

Use grep when you know the exact text you're looking for — function names, config keys, error messages. Use qmd when you need to find docs by concept or meaning.

# Find exact string matches $ grep -rl "useAction" /docs /docs/hooks/useAction.mdx /docs/guides/actions.mdx # Find conceptually related docs $ qmd "how to trigger server-side actions from the frontend" /docs/hooks/useAction.mdx (0.92) /docs/guides/actions.mdx (0.88) /docs/guides/server-actions.mdx (0.79)

Regular grep passes through to real bash unchanged — it is never intercepted or replaced.

Related Files

The related command finds semantically similar files across all indexed sources. Give it a file path, and it returns the closest matches by content similarity.

$ related /docs/guides/auth.mdx /docs/reference/auth-config.mdx (0.91) /docs/guides/middleware.mdx (0.84) /docs/guides/sessions.mdx (0.78) /docs/reference/jwt.mdx (0.72)

This is useful for discovering related documentation that agents might not find through directory browsing alone — especially across different source repositories.

Grep-Miss Suggestions

When an agent runs grep and gets no results, Pathfinder automatically appends a hint suggesting qmd as an alternative. This nudges agents toward semantic search when exact-match grep fails, without forcing a workflow change.

$ grep -rl "authentication flow" /docs # (no results) # Hint: No matches found. Try `qmd "authentication flow"` for semantic search.

The hint only appears when grep returns zero results and a search tool is configured for the same source. It does not appear when grep finds matches.

Search Tool Parameters

When agents call the search tool directly (not via qmd), these parameters are available:

Parameter Type Default Description
querystring-The search query (required)
limitnumberConfig defaultMax results to return (capped at max_limit)
min_scorenumberConfig defaultMinimum cosine similarity (0-1). Results below this threshold are filtered out.
versionstring-Filter results to a specific version tag (matches the version field on sources)

Note: The output format (docs, code, or raw) is configured at the tool level via result_format in your pathfinder.yaml, not as a per-query parameter.

Configuring Search

Search requires three pieces of configuration in your pathfinder.yaml: a source, a search tool, and embedding settings. See the full config reference for all options.

sources: - name: docs type: markdown repo: https://github.com/org/repo.git path: docs/ file_patterns: ["**/*.mdx", "**/*.md"] chunk: target_tokens: 600 overlap_tokens: 50 tools: - name: search-docs type: search description: Search the documentation source: docs default_limit: 5 max_limit: 20 result_format: docs min_score: 0.3 # Filter out low-quality matches embedding: provider: openai model: text-embedding-3-small dimensions: 1536 indexing: auto_reindex: true reindex_hour_utc: 3 stale_threshold_hours: 24

The embedding and indexing blocks are required whenever any search tool is configured. The first server boot after adding search triggers an initial indexing pass. Subsequent updates happen on the schedule defined by indexing or when triggered by webhooks.