Rate Limiting
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

The API enforces rate limits to ensure fair usage across all partners. When a client exceeds its allotted request rate, the API returns 429 Too Many Requests with a Retry-After header indicating how long to wait before retrying.

Response Format

All 429 responses include the standard error envelope and a Retry-After header:

Headers:

Header	Type	Description
`Retry-After`	integer	Number of seconds to wait before retrying.

Body:

{
  "error": {
    "code": "RATE_LIMIT_ERROR",
    "message": "Rate limit exceeded.",
    "detail": "You have exceeded 100 requests per minute. Wait 60 seconds before retrying.",
    "request_id": "e5f6a7b8-c9d0-1234-efab-345678901234",
    "field": null
  }
}

The Retry-After value is an integer representing seconds. Always use this header value as the minimum wait time -- do not hard-code a fixed retry delay.

See the 429 response spec for the full response schema.

Backoff Strategy

When rate limited, use exponential backoff with jitter to avoid thundering herd problems:

Wait for Retry-After seconds from the response header.
If still limited, double the wait interval on each subsequent retry.
Add random jitter (0 to 1 second) to prevent synchronized retries from multiple clients.
Cap the maximum wait at 5 minutes (300 seconds) to avoid indefinite blocking.

import time
import random
import requests

def request_with_backoff(method, url, headers, json=None, max_retries=5):
    """
    Make an API request with exponential backoff on 429 responses.

    Uses the Retry-After header as the base wait time, then doubles
    the interval on each subsequent retry with random jitter.
    """
    wait_time = 0

    for attempt in range(max_retries):
        response = requests.request(method, url, headers=headers, json=json)

        if response.status_code != 429:
            return response

        # Use Retry-After header as the base wait time
        retry_after = int(response.headers.get("Retry-After", 5))

        if attempt == 0:
            wait_time = retry_after
        else:
            wait_time = min(wait_time * 2, 300)  # Double, cap at 5 minutes

        # Add random jitter (0-1 second) to prevent thundering herd
        jitter = random.uniform(0, 1)
        time.sleep(wait_time + jitter)

    # All retries exhausted
    raise Exception(f"Rate limited after {max_retries} retries for {method} {url}")

Limit Tiers

Rate limits are applied at multiple tiers to balance throughput with fair access. Exact limits are communicated during partner onboarding and may vary by partner tier and environment (sandbox vs. production).

Tier	Applies To	Description
Per-API-Key (global)	All endpoints	A global request budget shared across all endpoints. Prevents any single partner from monopolizing API capacity.
Per-Location	Location-scoped endpoints	Limits requests targeting a specific location. Prevents hot-spotting where one busy location degrades performance for others.
Per-Endpoint	Individual endpoints	Read endpoints (e.g., `GET /locations`, `GET .../inventory`) have higher limits. Write endpoints (e.g., `POST /carts`, `POST .../payments`) have lower limits.

Endpoint-specific guidance:

Auth token endpoint (POST /auth/token): Has a lower per-endpoint limit. Cache your tokens (24-hour TTL) and reuse them -- do not request a new token for every API call.
Inventory endpoint (GET /locations/{id}/inventory): Has a higher per-endpoint limit to support bulk stock queries across many items.
Webhook events: Webhooks are pushed to your server by Tote. They are not subject to your rate limits.

Note: The example in the 429 response ("100 requests per minute") is illustrative. Your actual limits are communicated during partner onboarding and displayed in the Developer Portal.

Best Practices

Cache tokens. Tokens are valid for 24 hours. Requesting a new token per API call is the most common cause of rate limiting. See the Authentication guide.
Use menu metadata polling. Compare the version_hash from GET .../menu/metadata before fetching the full menu. This avoids unnecessary large responses. See the Menu Sync guide.
Batch inventory checks. The inventory endpoint supports higher pagination limits (up to 500 items per page) specifically for bulk queries. Use limit=500 to minimize the number of requests.
Implement circuit breakers. If you receive sustained 429 responses despite following the backoff strategy, stop making requests for a longer period (e.g., 10 minutes) and alert your operations team.
Stagger location polling. When syncing data across many locations, spread requests over time rather than fetching all locations simultaneously.

Rate Limiting
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

Response Format

Backoff Strategy

Limit Tiers

Best Practices

See Also

Was this helpful?

Rate LimitingCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from ClaudeConnect to CursorInstall MCP server on CursorConnect to VS CodeInstall MCP server on VS Code

Response Format

Backoff Strategy

Limit Tiers

Best Practices

See Also

Was this helpful?

Rate Limiting
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code