Smart Routing & Quotas
The SDK features a sophisticated, in-memory QuotaMap and Circuit Breaker that tracks rate limits across your entire vault in real-time to guarantee high availability.
The Priority Ladder (Zero-Trust Fallbacks)
By default, the SDK automatically maps failing requests to equivalent models on other providers. But for production resilience, you can define an exact Priority Ladder. If the primary provider hits a 429 Rate Limit or 5xx Server Error, the SDK instantly cascades down your predefined chain of Provider-Model pairings.
const keyking = new KeyKing({ routingRules: [{ provider: "Groq", model: "llama-3.3-70b-versatile" }, { provider: "Anthropic", model: "claude-3-5-sonnet-20241022" }, { provider: "OpenAI", model: "gpt-4o" } ]}); const response = await keyking.chat.completions.create({ model: "any-model", // Ignored when routingRules are set messages: [{ role: "user", content: "Hello!" }], }); // Verify which provider handled the request console.log(response._keyking_provider);
Circuit Breaker
The SDK actively tracks rate-limit headers (like x-ratelimit-remaining-requests) returned by providers. It uses an in-memory Circuit Breaker to temporarily quarantine keys that hit rate limits or return 401 Unauthorized errors for 5 minutes. The QuotaMap sorting algorithm strictly prioritizes keys with the highest remaining tokens/requests to optimize latency.