Search Features
Semantic search, filesystem exploration, and the commands that bridge them.
How Semantic Search Works
Pathfinder uses a three-stage pipeline to answer conceptual questions about your docs:
- Indexing — Your docs are chunked (split into segments based on headings, functions, or token counts) and each chunk is converted into a high-dimensional vector using OpenAI's embedding model.
- Storage — Vectors are stored in PostgreSQL using the pgvector extension, which provides efficient approximate nearest-neighbor search.
- Retrieval — When an agent searches, the query is embedded using the same model, then compared against stored vectors using cosine similarity. The most similar chunks are returned, ranked by score (0-1).
This means agents can ask "how do I handle authentication?" and find the relevant docs even if they never contain the word "authentication" — because the meaning is captured in the vector representation.
qmd
When grep_strategy is set to vector or hybrid in your bash tool config, Pathfinder exposes a qmd command inside the bash sandbox. This is a semantic search command that agents can use alongside regular shell tools.
qmd returns file paths ranked by semantic similarity, with scores in parentheses. Higher scores mean stronger matches.
qmd vs grep
Use grep when you know the exact text you're looking for — function names, config keys, error messages. Use qmd when you need to find docs by concept or meaning.
Regular grep passes through to real bash unchanged — it is never intercepted or replaced.
Related Files
The related command finds semantically similar files across all indexed sources. Give it a file path, and it returns the closest matches by content similarity.
This is useful for discovering related documentation that agents might not find through directory browsing alone — especially across different source repositories.
Grep-Miss Suggestions
When an agent runs grep and gets no results, Pathfinder automatically appends a hint suggesting qmd as an alternative. This nudges agents toward semantic search when exact-match grep fails, without forcing a workflow change.
The hint only appears when grep returns zero results and a search tool is configured for the same source. It does not appear when grep finds matches.
Search Tool Parameters
When agents call the search tool directly (not via qmd), these parameters are available:
| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | - | The search query (required) |
limit | number | Config default | Max results to return (capped at max_limit) |
min_score | number | Config default | Minimum cosine similarity (0-1). Results below this threshold are filtered out. |
version | string | - | Filter results to a specific version tag (matches the version field on sources) |
Note: The output format (docs, code, or raw) is configured at the tool level via result_format in your pathfinder.yaml, not as a per-query parameter.
Configuring Search
Search requires three pieces of configuration in your pathfinder.yaml: a source, a search tool, and embedding settings. See the full config reference for all options.
The embedding and indexing blocks are required whenever any search tool is configured. The first server boot after adding search triggers an initial indexing pass. Subsequent updates happen on the schedule defined by indexing or when triggered by webhooks.