Rate Limits | Creor API

Creor enforces rate limits to ensure fair usage and platform stability. Limits vary by plan tier and are applied per API key.

Limits by Plan

Rate limits are applied per API key on a rolling monthly window. The counter resets on the first day of each calendar month (UTC). Each successful request counts as one unit regardless of the model used or tokens consumed.

Plan	Monthly Requests	Requests / Minute	Concurrent Requests
Free	200	10	1
Starter	1,000	30	3
Pro	Unlimited	120	10
Enterprise	Unlimited	Custom	Custom

Note

The per-minute and concurrency limits apply even on the Pro and Enterprise plans. These protect the platform from sudden traffic spikes and ensure consistent latency for all users.

What Counts as a Request

Each call to /v1/chat/completions counts as one request, whether streaming or non-streaming.
Polling endpoints (e.g., agent status) count as one request per poll.
Requests that fail with a 4xx client error still count toward your limit.
Requests that fail with a 5xx server error do not count toward your limit.

Rate Limit Headers

Every API response includes headers that tell you your current rate limit status. Use these to monitor your consumption and implement client-side throttling.

Header	Type	Description
X-RateLimit-Limit	integer	Maximum number of requests allowed in the current period.
X-RateLimit-Remaining	integer	Number of requests remaining in the current period.
X-RateLimit-Reset	integer	Unix timestamp (seconds) when the current period resets.
X-RateLimit-Limit-Minute	integer	Maximum requests allowed per minute.
X-RateLimit-Remaining-Minute	integer	Requests remaining in the current minute window.
Retry-After	integer	Seconds to wait before retrying (only present on 429 responses).

Example Response Headers

HTTP/2 200 OK
Content-Type: application/json
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1714521600
X-RateLimit-Limit-Minute: 30
X-RateLimit-Remaining-Minute: 28

The X-RateLimit-Limit and X-RateLimit-Remaining headers reflect your monthly quota. The minute-level headers reflect the short-term burst limit.

Handling 429 Responses

When you exceed either the monthly or per-minute limit, the API returns a 429 Too Many Requests response with a JSON error body and a Retry-After header.

429 Response Body

{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "You have exceeded your per-minute request limit. Please retry after 12 seconds.",
    "retry_after": 12
  }
}

Retry Strategy

Implement exponential backoff with jitter when you receive a 429. The Retry-After header gives you the minimum wait time, but adding jitter prevents thundering herd problems when multiple clients are rate-limited simultaneously.

async function fetchWithRetry(url: string, options: RequestInit, maxRetries = 3) {

for (let attempt = 0; attempt <= maxRetries; attempt++) {

const response = await fetch(url, options);

if (response.status !== 429) {

return response;

}

if (attempt === maxRetries) {

throw new Error("Rate limit exceeded after max retries");

}

const retryAfter = parseInt(response.headers.get("Retry-After") || "5", 10);

const jitter = Math.random() * 1000;

const delay = retryAfter * 1000 + jitter;

await new Promise((resolve) => setTimeout(resolve, delay));

}

Python Example

import time

import random

import requests

def fetch_with_retry(url, headers, json_body, max_retries=3):

for attempt in range(max_retries + 1):

response = requests.post(url, headers=headers, json=json_body)

if response.status_code != 429:

return response

if attempt == max_retries:

raise Exception("Rate limit exceeded after max retries")

retry_after = int(response.headers.get("Retry-After", 5))

jitter = random.uniform(0, 1)

time.sleep(retry_after + jitter)

Warning

Do not retry immediately without respecting the Retry-After header. Aggressive retries can result in longer backoff periods or temporary key suspension.

Best Practices

Follow these practices to stay within your rate limits and build resilient integrations.

Monitor the X-RateLimit-Remaining header and slow down requests when it drops below 10% of your limit.
Use streaming responses for chat completions -- a single streaming request is more efficient than polling for results.
Cache responses when appropriate. If multiple users ask the same question, serve the cached result instead of making duplicate API calls.
Batch operations where possible. Use the models list endpoint once and cache it rather than calling it before every completion request.
Distribute requests evenly across the minute window instead of sending bursts.
Set up alerts in your monitoring system when X-RateLimit-Remaining drops below a threshold.
On the Free plan, consider upgrading to Starter if you regularly hit the 200 request/month ceiling.

Tip

Enterprise customers can request custom rate limits tailored to their workload. Contact sales@creor.ai to discuss your requirements.