Codebase Search Overview

Creor includes a built-in RAG (Retrieval-Augmented Generation) pipeline that indexes your codebase and provides the agent with relevant code context. This means the agent can find and reference code it has never seen in the current conversation.

How It Works

When you open a project in Creor, the RAG pipeline indexes your source files in the background. Each file is split into semantically meaningful chunks, converted into vector embeddings, and stored in a local vector database. When the agent needs to find code relevant to your request, it queries this index instead of reading every file.

Your codebase
  -> File chunking (semantic splitting + AST parsing)
    -> Embedding (Voyage AI or Nomic)
      -> Vector store (LanceDB)
        -> Query time: hybrid search (vector + keyword)
          -> Reranking (Jina or Voyage AI)
            -> Top results returned to the agent

This pipeline runs entirely locally. Your code is never sent to external servers for indexing -- embeddings are generated using lightweight API calls that send only small text chunks, not entire files.

Query Classification

Not every query benefits from the same search strategy. Creor's query classifier analyzes each search request and routes it to the optimal pipeline.

Query TypeStrategyExample
ConceptualVector-heavy with broad retrieval"How does authentication work in this project?"
Identifier lookupGrep-heavy with exact matching"Find the UserService class"
MixedBalanced hybrid with reranking"Where is the rate limiter configured and how does it work?"
File pathDirect file lookup, skip search"Show me src/auth/middleware.ts"

The classifier runs before the search and adds no perceptible latency. It examines the query structure, presence of identifiers (camelCase, PascalCase, snake_case), and natural language indicators to make its routing decision.

When Search Is Used

The agent does not search your codebase on every message. Search is triggered when the agent determines it needs additional context that is not already in the conversation.

  • You ask about code the agent has not read yet in this session.
  • The agent needs to find all usages of a function before refactoring it.
  • You ask a question about project architecture or how a feature is implemented.
  • The agent is planning a multi-file change and needs to understand dependencies.
  • You reference a concept ("the auth middleware") without specifying a file path.

Tip

You can explicitly trigger a codebase search by asking the agent to "search the codebase for..." or "find all files related to...". The agent will use the codesearch tool, which invokes the full RAG pipeline.

Search Quality

Several factors affect how well codebase search performs in your project.

FactorImpactWhat You Can Do
Project sizeLarger projects benefit more from semantic searchLet indexing complete before relying on search-heavy queries
Code documentationWell-commented code produces better embeddingsJSDoc, docstrings, and inline comments improve search recall
File typesSource code is indexed; binary files and media are skippedCheck .gitignore -- files ignored by git are also ignored by the indexer
Embedding modelDifferent models have different strengthsVoyage AI code-3 is optimized for code; Nomic is a solid general-purpose alternative

Note

Search results include file paths and line numbers so you can verify context before the agent acts on it. If a search result looks wrong, you can correct the agent and it will refine its query.

Next Steps