# Rate Limiting

The API enforces rate limits to ensure fair usage across all partners. When a client exceeds its allotted request rate, the API returns **429 Too Many Requests** with a `Retry-After` header indicating how long to wait before retrying.

## Response Format

All 429 responses include the standard error envelope and a `Retry-After` header:

**Headers:**

| Header | Type | Description |
|  --- | --- | --- |
| `Retry-After` | integer | Number of seconds to wait before retrying. |


**Body:**


```json
{
  "error": {
    "code": "RATE_LIMIT_ERROR",
    "message": "Rate limit exceeded.",
    "detail": "You have exceeded 100 requests per minute. Wait 60 seconds before retrying.",
    "request_id": "e5f6a7b8-c9d0-1234-efab-345678901234",
    "field": null
  }
}
```

The `Retry-After` value is an integer representing seconds. Always use this header value as the minimum wait time -- do not hard-code a fixed retry delay.

See the [429 response spec](/assets/toomanyrequests.8e33874088e80858cea465db8a2b27472eae708ddb1f1a5271d1c3bad8bf77c2.4de535d0.yaml) for the full response schema.

## Backoff Strategy

When rate limited, use exponential backoff with jitter to avoid thundering herd problems:

1. **Wait for `Retry-After` seconds** from the response header.
2. **If still limited, double the wait interval** on each subsequent retry.
3. **Add random jitter** (0 to 1 second) to prevent synchronized retries from multiple clients.
4. **Cap the maximum wait at 5 minutes** (300 seconds) to avoid indefinite blocking.


```python
import time
import random
import requests

def request_with_backoff(method, url, headers, json=None, max_retries=5):
    """
    Make an API request with exponential backoff on 429 responses.

    Uses the Retry-After header as the base wait time, then doubles
    the interval on each subsequent retry with random jitter.
    """
    wait_time = 0

    for attempt in range(max_retries):
        response = requests.request(method, url, headers=headers, json=json)

        if response.status_code != 429:
            return response

        # Use Retry-After header as the base wait time
        retry_after = int(response.headers.get("Retry-After", 5))

        if attempt == 0:
            wait_time = retry_after
        else:
            wait_time = min(wait_time * 2, 300)  # Double, cap at 5 minutes

        # Add random jitter (0-1 second) to prevent thundering herd
        jitter = random.uniform(0, 1)
        time.sleep(wait_time + jitter)

    # All retries exhausted
    raise Exception(f"Rate limited after {max_retries} retries for {method} {url}")
```

## Limit Tiers

Rate limits are applied at multiple tiers to balance throughput with fair access. Exact limits are communicated during partner onboarding and may vary by partner tier and environment (sandbox vs. production).

| Tier | Applies To | Description |
|  --- | --- | --- |
| **Per-API-Key** (global) | All endpoints | A global request budget shared across all endpoints. Prevents any single partner from monopolizing API capacity. |
| **Per-Location** | Location-scoped endpoints | Limits requests targeting a specific location. Prevents hot-spotting where one busy location degrades performance for others. |
| **Per-Endpoint** | Individual endpoints | Read endpoints (e.g., `GET /locations`, `GET .../inventory`) have higher limits. Write endpoints (e.g., `POST /carts`, `POST .../payments`) have lower limits. |


**Endpoint-specific guidance:**

- **Auth token endpoint** (`POST /auth/token`): Has a lower per-endpoint limit. Cache your tokens (24-hour TTL) and reuse them -- do not request a new token for every API call.
- **Inventory endpoint** (`GET /locations/{id}/inventory`): Has a higher per-endpoint limit to support bulk stock queries across many items.
- **Webhook events**: Webhooks are pushed to your server by Tote. They are not subject to your rate limits.


> **Note:** The example in the 429 response ("100 requests per minute") is illustrative. Your actual limits are communicated during partner onboarding and displayed in the Developer Portal.


## Best Practices

- **Cache tokens.** Tokens are valid for 24 hours. Requesting a new token per API call is the most common cause of rate limiting. See the [Authentication guide](/online-ordering/guides/02-authentication#token-caching-strategy).
- **Use menu metadata polling.** Compare the `version_hash` from `GET .../menu/metadata` before fetching the full menu. This avoids unnecessary large responses. See the [Menu Sync guide](/online-ordering/guides/03-menu-sync).
- **Batch inventory checks.** The inventory endpoint supports higher pagination limits (up to 500 items per page) specifically for bulk queries. Use `limit=500` to minimize the number of requests.
- **Implement circuit breakers.** If you receive sustained 429 responses despite following the backoff strategy, stop making requests for a longer period (e.g., 10 minutes) and alert your operations team.
- **Stagger location polling.** When syncing data across many locations, spread requests over time rather than fetching all locations simultaneously.


## See Also

- [Error Codes Reference](/online-ordering/reference/error-codes#429-too-many-requests) -- All 429 error scenarios.
- [Getting Started Guide](/online-ordering/guides/01-getting-started#error-handling) -- Error handling overview.
- [Authentication Guide](/online-ordering/guides/02-authentication#token-caching-strategy) -- Token caching to avoid rate limits.