Configuration

Customize the codebase search pipeline by choosing embedding providers, configuring the vector store, and tuning search behavior. All settings live in your project's creor.json file or the RAG plugin configuration.

Embedding Providers

Creor supports two embedding providers out of the box. The embedding model converts code chunks into vectors for semantic search. Choose based on your use case and API access.

Voyage AI

Voyage AI offers embedding models specifically optimized for code. The voyage-code-3 model is the default and recommended choice for most codebases.

ModelDimensionsMax TokensBest For
voyage-code-3102416000Code-heavy repositories. Best code retrieval quality.
voyage-3-large102432000Mixed code and documentation. Larger context window.
voyage-3-lite51216000Budget-conscious usage. Faster, lower cost per embedding.
1
2
3
4
5
6
7
8
9
10
11
{
"plugins": {
"devflow-rag": {
"embedding": {
"provider": "voyage",
"model": "voyage-code-3",
"apiKey": "$VOYAGE_API_KEY"
}
}
}
}

Tip

Use an environment variable reference ($VOYAGE_API_KEY) instead of hardcoding your key in creor.json. Creor resolves environment variables at runtime.

Nomic

Nomic provides open-weight embedding models with a generous free tier. A good alternative if you do not have Voyage AI access.

ModelDimensionsMax TokensBest For
nomic-embed-text-v1.57688192General-purpose text and code embedding.
nomic-embed-code-v17688192Code-specific embedding with improved identifier handling.
1
2
3
4
5
6
7
8
9
10
11
{
"plugins": {
"devflow-rag": {
"embedding": {
"provider": "nomic",
"model": "nomic-embed-text-v1.5",
"apiKey": "$NOMIC_API_KEY"
}
}
}
}

Creor Gateway

If you are signed into Creor with an active subscription, embeddings are routed through the Creor Gateway by default. This means you do not need to configure a separate embedding API key -- it is included in your plan.

Vector Store

Creor uses LanceDB as its local vector store. LanceDB is an embedded vector database that runs in-process with no external dependencies -- no Docker containers, no separate server processes.

Why LanceDB

  • Zero configuration: works out of the box with no setup.
  • Fast: optimized columnar storage with SIMD-accelerated similarity search.
  • Compact: stores vectors efficiently on disk. A 10K-file codebase typically uses 50-100 MB.
  • Portable: the entire index is a directory of files that can be copied or deleted.

Storage Settings

SettingDefaultDescription
storagePath.creor/rag/indexDirectory for the vector store files.
tableNamecode_chunksName of the LanceDB table. Change if running multiple index configs.
overwritefalseIf true, drops and recreates the table on each full index. Use for debugging.
1
2
3
4
5
6
7
8
9
10
{
"plugins": {
"devflow-rag": {
"vectorStore": {
"storagePath": ".creor/rag/index",
"tableName": "code_chunks"
}
}
}
}

Reranking

After the initial hybrid search retrieves candidate results, a reranker scores each result against the original query to improve ranking quality. Reranking is especially valuable when combining results from vector and keyword search.

Supported Rerankers

ProviderModelStrength
Jinajina-reranker-v2-base-multilingualFast, multilingual, good for mixed-language codebases.
Voyage AIrerank-2High accuracy for code, pairs well with Voyage embeddings.
1
2
3
4
5
6
7
8
9
10
11
12
{
"plugins": {
"devflow-rag": {
"reranker": {
"provider": "jina",
"model": "jina-reranker-v2-base-multilingual",
"apiKey": "$JINA_API_KEY",
"topK": 10
}
}
}
}

The topK parameter controls how many results the reranker returns to the agent. Higher values provide more context but consume more tokens in the agent's context window.

Note

Reranking is optional. If no reranker is configured, Creor uses reciprocal rank fusion to merge results from the vector and keyword searches. This still produces good results for most codebases.

Search Tuning

Fine-tune how the search pipeline behaves with these additional settings.

SettingDefaultDescription
vectorWeight0.6Weight for vector search results in hybrid fusion (0.0-1.0).
keywordWeight0.4Weight for keyword/grep search results in hybrid fusion (0.0-1.0).
maxResults20Maximum number of candidate results before reranking.
minScore0.3Minimum similarity score to include a result (0.0-1.0).
contextLines3Number of surrounding lines to include with each result for context.
1
2
3
4
5
6
7
8
9
10
11
12
13
{
"plugins": {
"devflow-rag": {
"search": {
"vectorWeight": 0.6,
"keywordWeight": 0.4,
"maxResults": 20,
"minScore": 0.3,
"contextLines": 3
}
}
}
}

Tip

If your codebase uses highly specific identifiers (e.g., generated code with unique prefixes), increase keywordWeight to 0.5 or higher. If your code is well-documented with natural language, lean toward vectorWeight.

Full Configuration Reference

Here is a complete creor.json with all RAG-related settings shown with their defaults.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
{
"plugins": {
"devflow-rag": {
"embedding": {
"provider": "voyage",
"model": "voyage-code-3",
"apiKey": "$VOYAGE_API_KEY"
},
"vectorStore": {
"storagePath": ".creor/rag/index",
"tableName": "code_chunks"
},
"reranker": {
"provider": "jina",
"model": "jina-reranker-v2-base-multilingual",
"apiKey": "$JINA_API_KEY",
"topK": 10
},
"search": {
"vectorWeight": 0.6,
"keywordWeight": 0.4,
"maxResults": 20,
"minScore": 0.3,
"contextLines": 3
},
"indexer": {
"maxChunkSize": 1500,
"minChunkSize": 50,
"chunkOverlap": 100,
"batchSize": 100
},
"exclude": [
"node_modules/**",
"vendor/**",
"dist/**",
"build/**",
".git/**"
]
}
}
}

Note

You only need to include settings you want to override. Omitted settings use the defaults shown above.

Next Steps