Embedding Providers
Creor supports two embedding providers out of the box. The embedding model converts code chunks into vectors for semantic search. Choose based on your use case and API access.
Voyage AI
Voyage AI offers embedding models specifically optimized for code. The voyage-code-3 model is the default and recommended choice for most codebases.
| Model | Dimensions | Max Tokens | Best For |
|---|---|---|---|
| voyage-code-3 | 1024 | 16000 | Code-heavy repositories. Best code retrieval quality. |
| voyage-3-large | 1024 | 32000 | Mixed code and documentation. Larger context window. |
| voyage-3-lite | 512 | 16000 | Budget-conscious usage. Faster, lower cost per embedding. |
Tip
Nomic
Nomic provides open-weight embedding models with a generous free tier. A good alternative if you do not have Voyage AI access.
| Model | Dimensions | Max Tokens | Best For |
|---|---|---|---|
| nomic-embed-text-v1.5 | 768 | 8192 | General-purpose text and code embedding. |
| nomic-embed-code-v1 | 768 | 8192 | Code-specific embedding with improved identifier handling. |
Creor Gateway
If you are signed into Creor with an active subscription, embeddings are routed through the Creor Gateway by default. This means you do not need to configure a separate embedding API key -- it is included in your plan.
Vector Store
Creor uses LanceDB as its local vector store. LanceDB is an embedded vector database that runs in-process with no external dependencies -- no Docker containers, no separate server processes.
Why LanceDB
- Zero configuration: works out of the box with no setup.
- Fast: optimized columnar storage with SIMD-accelerated similarity search.
- Compact: stores vectors efficiently on disk. A 10K-file codebase typically uses 50-100 MB.
- Portable: the entire index is a directory of files that can be copied or deleted.
Storage Settings
| Setting | Default | Description |
|---|---|---|
| storagePath | .creor/rag/index | Directory for the vector store files. |
| tableName | code_chunks | Name of the LanceDB table. Change if running multiple index configs. |
| overwrite | false | If true, drops and recreates the table on each full index. Use for debugging. |
Reranking
After the initial hybrid search retrieves candidate results, a reranker scores each result against the original query to improve ranking quality. Reranking is especially valuable when combining results from vector and keyword search.
Supported Rerankers
| Provider | Model | Strength |
|---|---|---|
| Jina | jina-reranker-v2-base-multilingual | Fast, multilingual, good for mixed-language codebases. |
| Voyage AI | rerank-2 | High accuracy for code, pairs well with Voyage embeddings. |
The topK parameter controls how many results the reranker returns to the agent. Higher values provide more context but consume more tokens in the agent's context window.
Note
Search Tuning
Fine-tune how the search pipeline behaves with these additional settings.
| Setting | Default | Description |
|---|---|---|
| vectorWeight | 0.6 | Weight for vector search results in hybrid fusion (0.0-1.0). |
| keywordWeight | 0.4 | Weight for keyword/grep search results in hybrid fusion (0.0-1.0). |
| maxResults | 20 | Maximum number of candidate results before reranking. |
| minScore | 0.3 | Minimum similarity score to include a result (0.0-1.0). |
| contextLines | 3 | Number of surrounding lines to include with each result for context. |
Tip
Full Configuration Reference
Here is a complete creor.json with all RAG-related settings shown with their defaults.
Note