429 Too Many Requests

The client has sent too many requests in a given time window and must wait before retrying.

Quick Reference

Category4xx Client Error
RFCRFC 6585, Section 4
CacheableNo
RetryableYes — after waiting for the Retry-After period
Key headerRetry-After indicating when to retry

What 429 Too Many Requests Means

HTTP 429 Too Many Requests (RFC 6585) tells the client it has exceeded a rate limit. The server is refusing to process the request not because it is malformed or unauthorized, but because the client has sent too many requests in too short a time. The implicit message is: slow down and try again later.

429 is the standardized replacement for a variety of proprietary approaches that APIs used before the code was defined in 2012 — Twitter’s “420 Enhance Your Calm,” various 503s with custom retry headers, and 403 responses with rate limit error codes in the body.

The response typically includes a Retry-After header specifying either a number of seconds to wait or an HTTP-date after which the client can retry. Rate limit informational headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) are not standardized but widely used.

Rate Limiting Strategies

Fixed Window

Allow N requests per time window (e.g., 100 requests per minute). The window resets at a fixed boundary: every minute at :00 seconds. Simple to implement but allows burst traffic at window boundaries: 100 requests at 0:59 and another 100 at 1:00 are both within their respective windows.

Sliding Window

Track the timestamp of each request and count how many fell within the last N seconds. Prevents boundary bursting. More memory-intensive to implement but more accurate rate enforcement.

Token Bucket

A “bucket” fills with tokens at a steady rate (e.g., 1 token per second, max 60). Each request consumes one token. When the bucket is empty, requests get 429. This allows controlled bursting: a client with 60 tokens saved can make 60 requests instantly, then is limited to 1 per second.

Leaky Bucket

Requests queue up and are processed at a fixed rate. Excess requests are dropped (429) if the queue is full. This provides very smooth rate enforcement but adds latency for requests that queue.

Rate Limit Response Headers

RFC 6585 only standardizes the status code, not the rate limit headers. The de facto standard headers used by most APIs:

HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714000830
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "message": "100 requests per minute limit exceeded",
  "retryAfter": 30
}

X-RateLimit-Limit: the total number of requests allowed in the window. X-RateLimit-Remaining: how many requests are left in the current window. X-RateLimit-Reset: Unix timestamp when the window resets (or seconds until reset — APIs differ). Retry-After: the RFC-standard header; either a number of seconds (30) or an HTTP-date (Mon, 25 Apr 2024 12:00:00 GMT).

The IETF RateLimit header fields draft (draft-ietf-httpapi-ratelimit-headers) aims to standardize these as RateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset. Not yet finalized at time of writing.

Implementing Rate Limiting

Express.js with express-rate-limit

const rateLimit = require('express-rate-limit');

const apiLimiter = rateLimit({
  windowMs: 60 * 1000,      // 1 minute
  max: 100,                  // 100 requests per window
  standardHeaders: true,     // sets RateLimit-* headers
  legacyHeaders: false,      // disables X-RateLimit-* headers
  handler: (req, res) => {
    res.status(429).json({
      error: 'rate_limit_exceeded',
      message: 'Too many requests, please try again later',
      retryAfter: Math.ceil(req.rateLimit.resetTime / 1000)
    });
  }
});

app.use('/api/', apiLimiter);

nginx

http {
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

    server {
        location /api/ {
            limit_req zone=api burst=20 nodelay;
            limit_req_status 429;
            proxy_pass http://backend;
        }
    }
}

Python with slowapi (FastAPI/Starlette)

from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)

@app.get("/api/data")
@limiter.limit("100/minute")
async def get_data(request: Request):
    return {"data": "..."}

Client-Side Retry Logic

A well-behaved API client handles 429 by reading the Retry-After header and waiting before retrying. Naive immediate retry loops make rate limiting worse by hammering the server with more requests during the cooldown period:

async function fetchWithRetry(url, options, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const res = await fetch(url, options);

    if (res.status !== 429) return res;

    const retryAfter = res.headers.get('Retry-After');
    const waitMs = retryAfter
      ? (isNaN(retryAfter) ? new Date(retryAfter) - Date.now() : retryAfter * 1000)
      : (2 ** attempt) * 1000; // exponential backoff if no header

    console.log(`Rate limited. Waiting ${waitMs}ms before retry ${attempt + 1}`);
    await new Promise(resolve => setTimeout(resolve, waitMs));
  }
  throw new Error('Max retries exceeded');
}

429 vs Related Status Codes

CodeMeaningRetryable?
429 Too Many RequestsRate limit exceededYes, after Retry-After
503 Service UnavailableServer temporarily overloaded or downYes, after Retry-After
403 ForbiddenNot authorized; credentials won’t helpNo
509 Bandwidth ExceededMonthly bandwidth cap hit (hosting)Next billing cycle

Frequently Asked Questions

Is Retry-After required on a 429 response?

RFC 6585 says servers SHOULD include Retry-After but does not make it mandatory. In practice, many APIs include it because without it, clients have no way to know when to retry and may use aggressive exponential backoff or simply fail. Always include Retry-After for a usable API.

What is the difference between 429 and 503?

429 is about client behavior: the specific client has sent too many requests. 503 is about server state: the server is temporarily unable to handle any requests due to overload or maintenance. A 429 is targeted at one client; a 503 is global. Rate limiting triggers 429; server overload or deployment triggers 503.

Should I rate limit by IP address or by user/API key?

Both, ideally. IP-based rate limiting catches anonymous abuse and DDoS traffic. API key or user-based rate limiting enforces per-account quotas. Relying only on IP limits is weak for scenarios where many users share an IP (corporate NAT, mobile carriers). Relying only on API keys does not protect unauthenticated endpoints.

Does caching affect rate limiting?

Yes, positively. Responses with appropriate cache headers reduce the number of requests that reach rate-limited application logic. A CDN cache hit bypasses rate limiting entirely. This is generally desirable for GET requests: rate limiting is intended to protect expensive operations, and a cache hit is not expensive. Non-cacheable POST/PUT/DELETE requests still consume quota.