Smart Routing & Quotas

The SDK features a sophisticated, in-memory QuotaMap and Circuit Breaker that tracks rate limits across your entire vault in real-time to guarantee high availability.

The Priority Ladder (Zero-Trust Fallbacks)

By default, the SDK automatically maps failing requests to equivalent models on other providers. But for production resilience, you can define an exact Priority Ladder. If the primary provider hits a 429 Rate Limit or 5xx Server Error, the SDK instantly cascades down your predefined chain of Provider-Model pairings.

TypeScript
const keyking = new KeyKing({
routingRules: [{ provider: "Groq",      model: "llama-3.3-70b-versatile" },
{ provider: "Anthropic", model: "claude-3-5-sonnet-20241022" },
{ provider: "OpenAI",    model: "gpt-4o" }
]});

const response = await keyking.chat.completions.create({
model: "any-model", // Ignored when routingRules are set
messages: [{ role: "user", content: "Hello!" }],
});

// Verify which provider handled the request
console.log(response._keyking_provider);

Circuit Breaker

The SDK actively tracks rate-limit headers (like x-ratelimit-remaining-requests) returned by providers. It uses an in-memory Circuit Breaker to temporarily quarantine keys that hit rate limits or return 401 Unauthorized errors for 5 minutes. The QuotaMap sorting algorithm strictly prioritizes keys with the highest remaining tokens/requests to optimize latency.