Skip to main content

Documentation Index

Fetch the complete documentation index at: https://unkey.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Unkey’s rate limiting is designed for global, low-latency enforcement across distributed systems.

Architecture

When you call limiter.limit(identifier):
  1. Request hits the nearest Unkey location
  2. Counter is checked and updated
  3. Decision returned in ~30ms globally
See real-time performance metrics at ratelimit.unkey.com.

Sliding window algorithm

Unkey uses a sliding window algorithm that provides smooth rate limiting without the “burst at window start” problem of fixed windows. Fixed window problem:
  • Limit: 100/minute
  • User sends 100 requests at 0:59
  • Window resets at 1:00
  • User sends 100 more at 1:01
  • Result: 200 requests in 2 seconds ❌
Sliding window solution:
  • Limit: 100/minute
  • Considers requests from the past 60 seconds at any point
  • No burst exploitation possible

Global consistency

Rate limits are enforced consistently across all regions. A user can’t bypass limits by hitting different geographic endpoints.

Cross-region denial propagation

When an identifier crosses its limit in any region, every other region picks up the denial within a few seconds and starts rejecting the same identifier locally — even before that region sees any of the abusive traffic firsthand. The window is honored end to end: as the offending window decays, every region releases the identifier at the same time. This means a single attacker hitting your API from multiple geographies can’t multiply their effective limit by the number of regions they hit. Once any region denies them, every region denies them. You don’t have to enable or configure anything — propagation runs automatically for every namespace.
Cross-region enforcement applies to limits with a window of at least 1 minute. Shorter windows (for example, per-second burst limits) are enforced per region only, because the propagation roundtrip takes longer than the window itself.

Response fields

Every rate limit check returns:
FieldTypeDescription
successbooleantrue if request is allowed
limitnumberThe configured limit
remainingnumberRequests left in current window
resetnumberUnix timestamp (ms) when window resets

Handling the response

const { success, remaining, reset } = await limiter.limit(identifier);

if (!success) {
  // Calculate retry time
  const retryAfter = Math.ceil((reset - Date.now()) / 1000);

  return new Response("Rate limit exceeded", {
    status: 429,
    headers: {
      "Retry-After": retryAfter.toString(),
      "X-RateLimit-Remaining": "0",
      "X-RateLimit-Reset": reset.toString(),
    },
  });
}

// Request allowed

Cost-based limiting

Not all requests are equal. Use cost to deduct more from the limit for expensive operations:
// Normal request: costs 1
await limiter.limit(userId);

// Expensive operation: costs 5
await limiter.limit(userId, { cost: 5 });
With a limit of 100/minute:
  • 100 normal requests, OR
  • 20 expensive requests, OR
  • Mix of both

Timeout and fallback

Configure behavior when Unkey is unreachable:
const limiter = new Ratelimit({
  rootKey: process.env.UNKEY_ROOT_KEY,
  namespace: "api",
  limit: 100,
  duration: "60s",
  timeout: {
    ms: 3000, // Wait max 3 seconds
    fallback: (identifier) => ({
      success: true, // Allow on timeout (or false to deny)
      limit: 0,
      remaining: 0,
      reset: Date.now(),
    }),
  },
  onError: (err, identifier) => {
    console.error(`Rate limit error for ${identifier}:`, err);
    return { success: true, limit: 0, remaining: 0, reset: Date.now() };
  },
});

Next steps

Custom overrides

Give specific users different limits

SDK reference

Full SDK documentation
Last modified on May 4, 2026