Rate Limits

Creor enforces rate limits to ensure fair usage and platform stability. Limits vary by plan tier and are applied per API key.

Limits by Plan

Rate limits are applied per API key on a rolling monthly window. The counter resets on the first day of each calendar month (UTC). Each successful request counts as one unit regardless of the model used or tokens consumed.

PlanMonthly RequestsRequests / MinuteConcurrent Requests
Free200101
Starter1,000303
ProUnlimited12010
EnterpriseUnlimitedCustomCustom

Note

The per-minute and concurrency limits apply even on the Pro and Enterprise plans. These protect the platform from sudden traffic spikes and ensure consistent latency for all users.

What Counts as a Request

  • Each call to /v1/chat/completions counts as one request, whether streaming or non-streaming.
  • Polling endpoints (e.g., agent status) count as one request per poll.
  • Requests that fail with a 4xx client error still count toward your limit.
  • Requests that fail with a 5xx server error do not count toward your limit.

Rate Limit Headers

Every API response includes headers that tell you your current rate limit status. Use these to monitor your consumption and implement client-side throttling.

HeaderTypeDescription
X-RateLimit-LimitintegerMaximum number of requests allowed in the current period.
X-RateLimit-RemainingintegerNumber of requests remaining in the current period.
X-RateLimit-ResetintegerUnix timestamp (seconds) when the current period resets.
X-RateLimit-Limit-MinuteintegerMaximum requests allowed per minute.
X-RateLimit-Remaining-MinuteintegerRequests remaining in the current minute window.
Retry-AfterintegerSeconds to wait before retrying (only present on 429 responses).

Example Response Headers

HTTP/2 200 OK
Content-Type: application/json
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1714521600
X-RateLimit-Limit-Minute: 30
X-RateLimit-Remaining-Minute: 28

The X-RateLimit-Limit and X-RateLimit-Remaining headers reflect your monthly quota. The minute-level headers reflect the short-term burst limit.

Handling 429 Responses

When you exceed either the monthly or per-minute limit, the API returns a 429 Too Many Requests response with a JSON error body and a Retry-After header.

429 Response Body

{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "You have exceeded your per-minute request limit. Please retry after 12 seconds.",
    "retry_after": 12
  }
}

Retry Strategy

Implement exponential backoff with jitter when you receive a 429. The Retry-After header gives you the minimum wait time, but adding jitter prevents thundering herd problems when multiple clients are rate-limited simultaneously.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
async function fetchWithRetry(url: string, options: RequestInit, maxRetries = 3) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const response = await fetch(url, options);
 
if (response.status !== 429) {
return response;
}
 
if (attempt === maxRetries) {
throw new Error("Rate limit exceeded after max retries");
}
 
const retryAfter = parseInt(response.headers.get("Retry-After") || "5", 10);
const jitter = Math.random() * 1000;
const delay = retryAfter * 1000 + jitter;
 
await new Promise((resolve) => setTimeout(resolve, delay));
}
}

Python Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import time
import random
import requests
 
def fetch_with_retry(url, headers, json_body, max_retries=3):
for attempt in range(max_retries + 1):
response = requests.post(url, headers=headers, json=json_body)
 
if response.status_code != 429:
return response
 
if attempt == max_retries:
raise Exception("Rate limit exceeded after max retries")
 
retry_after = int(response.headers.get("Retry-After", 5))
jitter = random.uniform(0, 1)
time.sleep(retry_after + jitter)

Warning

Do not retry immediately without respecting the Retry-After header. Aggressive retries can result in longer backoff periods or temporary key suspension.

Best Practices

Follow these practices to stay within your rate limits and build resilient integrations.

  • Monitor the X-RateLimit-Remaining header and slow down requests when it drops below 10% of your limit.
  • Use streaming responses for chat completions -- a single streaming request is more efficient than polling for results.
  • Cache responses when appropriate. If multiple users ask the same question, serve the cached result instead of making duplicate API calls.
  • Batch operations where possible. Use the models list endpoint once and cache it rather than calling it before every completion request.
  • Distribute requests evenly across the minute window instead of sending bursts.
  • Set up alerts in your monitoring system when X-RateLimit-Remaining drops below a threshold.
  • On the Free plan, consider upgrading to Starter if you regularly hit the 200 request/month ceiling.

Tip

Enterprise customers can request custom rate limits tailored to their workload. Contact sales@creor.ai to discuss your requirements.