Codebase Search Configuration

Customize the codebase search pipeline by choosing embedding providers, configuring the vector store, and tuning search behavior. All settings live in your project's creor.json file or the RAG plugin configuration.

Embedding Providers

Creor supports two embedding providers out of the box. The embedding model converts code chunks into vectors for semantic search. Choose based on your use case and API access.

Voyage AI

Voyage AI offers embedding models specifically optimized for code. The voyage-code-3 model is the default and recommended choice for most codebases.

Model	Dimensions	Max Tokens	Best For
voyage-code-3	1024	16000	Code-heavy repositories. Best code retrieval quality.
voyage-3-large	1024	32000	Mixed code and documentation. Larger context window.
voyage-3-lite	512	16000	Budget-conscious usage. Faster, lower cost per embedding.

{

"plugins": {

"devflow-rag": {

"embedding": {

"provider": "voyage",

"model": "voyage-code-3",

"apiKey": "$VOYAGE_API_KEY"

}

Tip

Use an environment variable reference ($VOYAGE_API_KEY) instead of hardcoding your key in creor.json. Creor resolves environment variables at runtime.

Nomic

Nomic provides open-weight embedding models with a generous free tier. A good alternative if you do not have Voyage AI access.

Model	Dimensions	Max Tokens	Best For
nomic-embed-text-v1.5	768	8192	General-purpose text and code embedding.
nomic-embed-code-v1	768	8192	Code-specific embedding with improved identifier handling.

{

"plugins": {

"devflow-rag": {

"embedding": {

"provider": "nomic",

"model": "nomic-embed-text-v1.5",

"apiKey": "$NOMIC_API_KEY"

}

Creor Gateway

If you are signed into Creor with an active subscription, embeddings are routed through the Creor Gateway by default. This means you do not need to configure a separate embedding API key -- it is included in your plan.

Vector Store

Creor uses LanceDB as its local vector store. LanceDB is an embedded vector database that runs in-process with no external dependencies -- no Docker containers, no separate server processes.

Why LanceDB

Zero configuration: works out of the box with no setup.
Fast: optimized columnar storage with SIMD-accelerated similarity search.
Compact: stores vectors efficiently on disk. A 10K-file codebase typically uses 50-100 MB.
Portable: the entire index is a directory of files that can be copied or deleted.

Storage Settings

Setting	Default	Description
storagePath	.creor/rag/index	Directory for the vector store files.
tableName	code_chunks	Name of the LanceDB table. Change if running multiple index configs.
overwrite	false	If true, drops and recreates the table on each full index. Use for debugging.

{

"plugins": {

"devflow-rag": {

"vectorStore": {

"storagePath": ".creor/rag/index",

"tableName": "code_chunks"

}

Reranking

After the initial hybrid search retrieves candidate results, a reranker scores each result against the original query to improve ranking quality. Reranking is especially valuable when combining results from vector and keyword search.

Supported Rerankers

Provider	Model	Strength
Jina	jina-reranker-v2-base-multilingual	Fast, multilingual, good for mixed-language codebases.
Voyage AI	rerank-2	High accuracy for code, pairs well with Voyage embeddings.

{

"plugins": {

"devflow-rag": {

"reranker": {

"provider": "jina",

"model": "jina-reranker-v2-base-multilingual",

"apiKey": "$JINA_API_KEY",

"topK": 10

}

The topK parameter controls how many results the reranker returns to the agent. Higher values provide more context but consume more tokens in the agent's context window.

Note

Reranking is optional. If no reranker is configured, Creor uses reciprocal rank fusion to merge results from the vector and keyword searches. This still produces good results for most codebases.

Search Tuning

Fine-tune how the search pipeline behaves with these additional settings.

Setting	Default	Description
vectorWeight	0.6	Weight for vector search results in hybrid fusion (0.0-1.0).
keywordWeight	0.4	Weight for keyword/grep search results in hybrid fusion (0.0-1.0).
maxResults	20	Maximum number of candidate results before reranking.
minScore	0.3	Minimum similarity score to include a result (0.0-1.0).
contextLines	3	Number of surrounding lines to include with each result for context.

{

"plugins": {

"devflow-rag": {

"search": {

"vectorWeight": 0.6,

"keywordWeight": 0.4,

"maxResults": 20,

"minScore": 0.3,

"contextLines": 3

}

Tip

If your codebase uses highly specific identifiers (e.g., generated code with unique prefixes), increase keywordWeight to 0.5 or higher. If your code is well-documented with natural language, lean toward vectorWeight.

Full Configuration Reference

Here is a complete creor.json with all RAG-related settings shown with their defaults.

{

"plugins": {

"devflow-rag": {

"embedding": {

"provider": "voyage",

"model": "voyage-code-3",

"apiKey": "$VOYAGE_API_KEY"

"vectorStore": {

"storagePath": ".creor/rag/index",

"tableName": "code_chunks"

"reranker": {

"provider": "jina",

"model": "jina-reranker-v2-base-multilingual",

"apiKey": "$JINA_API_KEY",

"topK": 10

"search": {

"vectorWeight": 0.6,

"keywordWeight": 0.4,

"maxResults": 20,

"minScore": 0.3,

"contextLines": 3

"indexer": {

"maxChunkSize": 1500,

"minChunkSize": 50,

"chunkOverlap": 100,

"batchSize": 100

"exclude": [

"node_modules/**",

"vendor/**",

"dist/**",

"build/**",

".git/**"

]

}

Note

You only need to include settings you want to override. Omitted settings use the defaults shown above.

Configuration

Embedding Providers

Voyage AI

Nomic

Creor Gateway

Vector Store

Why LanceDB

Storage Settings

Reranking

Supported Rerankers

Search Tuning

Full Configuration Reference

Next Steps

Search Overview

Indexing