diff --git a/docs/dive-deep/asynchronous-indexing-workflow.md b/docs/dive-deep/asynchronous-indexing-workflow.md index bc320754..febd8aaa 100644 --- a/docs/dive-deep/asynchronous-indexing-workflow.md +++ b/docs/dive-deep/asynchronous-indexing-workflow.md @@ -35,6 +35,39 @@ The flow diagram above shows the complete indexing workflow, illustrating how th - **`indexfailed`** - ❌ Error occurred, can retry - **`not_found`** - ❌ Not indexed yet +## How Progress Is Calculated + +`get_indexing_status` reports a **coarse, phase-based percentage**, not an exact fraction of files completed. + +- **0%** - Preparing the target collection and validating indexing prerequisites +- **~5%** - Scanning the codebase and building the file list +- **10% → 100%** - Processing files, chunking code, generating embeddings, and writing batches to the vector database +- **100%** - Indexing finished successfully + +This means it is normal for indexing to jump to around `10%` quickly on a large codebase. It reflects a transition from setup phases into file processing, not that exactly one tenth of all files are already indexed. + +Progress is also persisted periodically to the local MCP snapshot file at `~/.context/mcp-codebase-snapshot.json`, so very fast phases may appear as jumps rather than smooth increments. + +## When File and Chunk Counts Appear + +`get_indexing_status` shows file and chunk totals after a run has completed and the final statistics have been written to the local snapshot. + +During active indexing, the MCP server tracks progress percentage, but it does **not** stream live file/chunk totals through `get_indexing_status`. + +If you see an indexed entry with `0 files, 0 chunks`, that usually means the local snapshot metadata is stale or was created by an older / incomplete bookkeeping path. It is not a live count fetched from the vector database at status-check time. + +To refresh those stored totals, clear and re-index the **same absolute path**. + +## How Codebases Are Identified + +Claude Context tracks codebases by their resolved **absolute path**. + +- The MCP tools resolve relative paths to absolute paths before indexing, searching, clearing, or checking status. +- Collection identity is derived from the normalized absolute path. +- If you index the same repository through different absolute paths (for example, a symlink, a different clone, or a mounted path), Claude Context treats them as separate codebases. + +For the most predictable behavior, always use the same absolute path for `index_codebase`, `search_code`, `clear_index`, and `get_indexing_status`. + ## Key Benefits diff --git a/docs/troubleshooting/faq.md b/docs/troubleshooting/faq.md index e9cadf2a..adccb9f1 100644 --- a/docs/troubleshooting/faq.md +++ b/docs/troubleshooting/faq.md @@ -43,8 +43,38 @@ You can seamlessly use queries like `index this codebase` or `search the main fu - **Background Code Synchronization**: Continuously monitors for changes and automatically re-indexes modified parts - **Context-Aware Operations**: All indexing and search operations are scoped to the current project context +**Important path detail:** Claude Context keys each indexed codebase by its absolute path. If you index the same repository through different paths (for example, a symlinked path, a second clone, or a mounted path), those are treated as separate indexed codebases. + This makes it effortless to work across multiple projects while maintaining isolated, up-to-date indexes for each codebase. +## Q: Why does `get_indexing_status` jump quickly to 10% or feel coarse? + +**A:** The percentage is a **phase-based progress indicator**, not a live fraction of indexed files. + +In practice, Claude Context moves through broad stages: + +- collection preparation +- file scanning +- file processing, chunking, embedding, and insertion + +The status output can therefore jump quickly to around `10%` once setup is complete, even for very large repositories. That is expected behavior. + +For the full background workflow, see [Asynchronous Indexing Workflow](../dive-deep/asynchronous-indexing-workflow.md). + +## Q: Why does `get_indexing_status` show `0 files, 0 chunks` for a completed codebase? + +**A:** `get_indexing_status` reads the MCP snapshot metadata, not a live aggregate directly from the vector database. + +If a completed entry shows `0 files, 0 chunks`, the most common explanation is that the local snapshot metadata is stale or was created before final statistics were refreshed. + +What to do: + +1. Make sure you are checking the **same absolute path** that you originally indexed. +2. If the entry still shows zero counts, run `clear_index` for that path. +3. Re-run `index_codebase` for that exact absolute path. + +This refreshes the stored file/chunk totals used by `get_indexing_status`. + ## Q: How does Claude Context compare to other coding tools like Serena, Context7, or DeepWiki? **A:** Claude Context is specifically focused on **codebase indexing and semantic search**. Here's how we compare: diff --git a/packages/mcp/README.md b/packages/mcp/README.md index 4a562af5..efea0fb6 100644 --- a/packages/mcp/README.md +++ b/packages/mcp/README.md @@ -668,6 +668,16 @@ Get the current indexing status of a codebase. Shows progress percentage for act - `path` (required): Absolute path to the codebase directory to check status for +**What the status output means:** + +- Progress is **phase-based**, not a direct file-count ratio. The MCP server reports coarse milestones for collection preparation, file scanning, and file processing / embedding work. +- Because indexing runs in the background and progress is persisted periodically, percentages can jump quickly on large repositories or appear unchanged for a while during long embedding batches. +- File and chunk statistics are written when an indexing run finishes successfully. During active indexing, `get_indexing_status` intentionally reports progress rather than live file/chunk totals. +- Codebases are keyed by their **absolute path**. Indexing `/repo`, a symlinked path to the same repo, and a second clone will create separate tracked entries. +- If a completed entry shows `0 files, 0 chunks`, that usually means the local snapshot metadata is stale rather than the vector database being queried live. Re-indexing, or clearing and re-indexing that exact absolute path, refreshes the stored stats. + +For a deeper explanation, see the [asynchronous indexing workflow guide](../../docs/dive-deep/asynchronous-indexing-workflow.md) and the [troubleshooting FAQ](../../docs/troubleshooting/faq.md). + ## Contributing This package is part of the Claude Context monorepo. Please see: