Use case

Model Fallback Strategy

Declarative fallback chains for provider outages, rate limits, and quality regressions — defined per workflow, not coded ad-hoc per call site.

Overview

Frontier models go down, hit rate limits, or degrade. Without a fallback chain, every workflow that calls them does too. With a fallback chain, degradation becomes routing, not an incident.

What it solves

Removes the 'our chatbot is down because Anthropic / OpenAI / Google had an issue' headline. Lets routing absorb provider degradation gracefully.

How we build it

Per-workflow fallback chains: primary route, secondary route, optionally a tertiary degraded-but-functional route. Health checks and circuit-breaker logic at the gateway switch traffic on timeout, rate limit, or error budget burn. Every fallback event is traced so quality impact is measurable.

  • Per-workflow fallback chains
  • Circuit-breaker on timeout and rate limit
  • Health-check telemetry per provider
  • Quality-impact attribution on fallback events

What changes when it is in place

Provider incidents become brief routing events instead of customer-visible outages. The on-call rotation is notified; the workflow keeps running.