Apply resiliency policies to your apps
A resiliency policy declares how Catalyst recovers when an outbound call fails — without changing application code. You define named timeout, retry, and circuit-breaker behaviours in a Resiliency resource, scope it to one or more App IDs, and point each target (another app, or a component such as a state store or pub/sub broker) at the policies it should use. Catalyst enforces them inline on every matching call.
For the concepts behind retry shapes, breaker states, and how these compose with workflow retries, see Resiliency policies.
How resiliency policies work
A Resiliency resource has two parts:
spec.policies— namedtimeouts,retries, andcircuitBreakersdefinitions.spec.targets— bind those named policies to outbound calls, either to an app (apps.<app-id>) or a component (components.<name>).
The top-level scopes list names the App IDs the policy applies to. If you omit a dimension on a target, Catalyst applies a default: a 10s timeout, a constant retry of 5 attempts on HTTP 500–504, and a breaker that trips after 5 consecutive failures.
Timeouts
A timeout caps how long Catalyst waits for an outbound call before it terminates the call. Define named durations under policies.timeouts:
spec:
policies:
timeouts:
fast: 2s
relaxed: 30s
Retries
A retry policy reattempts a failed call. Use constant for a fixed wait between attempts, or exponential to back off and give an overloaded destination time to recover:
spec:
policies:
retries:
quickRetry:
policy: constant
duration: 1s
maxRetries: 3
backoff:
policy: exponential
duration: 200ms
maxInterval: 10s
maxRetries: 5
matching:
httpStatusCodes: "503,504"
Set maxRetries: -1 to retry indefinitely. The matching block limits retries to transient status codes; without it, Catalyst retries HTTP 500–504 and gRPC 14,4,8 by default. Don't retry permanent errors like 400 — they never succeed.
Circuit breakers
A circuit breaker stops sending calls to a target that keeps failing, so retries don't pile load onto a struggling destination. It trips when a condition you define crosses a threshold:
spec:
policies:
circuitBreakers:
breaker:
maxRequests: 1
interval: 30s
timeout: 60s
trip: "consecutiveFailures > 5"
trip is a CEL expression over counters such as consecutiveFailures, totalFailures, and requests. When the breaker trips (opens), calls fail fast for timeout; it then allows maxRequests probe calls to test whether the destination has recovered.
Bind the policies and apply
Reference the named policies from spec.targets, scope the resource to the App ID making the calls, and apply it with the Diagrid CLI:
# order-app-resiliency.yaml
apiVersion: dapr.io/v1alpha1
kind: Resiliency
metadata:
name: order-app-resiliency
scopes:
- order-app
spec:
policies:
timeouts:
fast: 2s
retries:
backoff:
policy: exponential
duration: 200ms
maxInterval: 10s
maxRetries: 5
matching:
httpStatusCodes: "503,504"
circuitBreakers:
breaker:
maxRequests: 1
interval: 30s
timeout: 60s
trip: "consecutiveFailures > 5"
targets:
apps:
inventory-app:
timeout: fast
retry: backoff
circuitBreaker: breaker
components:
orders-statestore:
outbound:
timeout: fast
retry: backoff
diagrid resiliency create -f order-app-resiliency.yaml --project my-project
This binds the policies to order-app: its service invocations to inventory-app and its writes to the orders-statestore component now time out after 2 seconds, retry transient failures with exponential backoff, and trip a breaker after five consecutive failures. For components, outbound covers calls the app makes to the component, while inbound covers deliveries from it, such as pub/sub messages.
When the calling app runs Catalyst Workflows, a workflow activity's own retry policy stacks on top of the resiliency policy — if both retry the same failure, the effective attempts multiply. Keep resiliency retries short for infrastructure transients and let workflow activity retries handle business-level outcomes. See composition with workflow activity retries.
Manage policies
List the resiliency policies in a project:
diagrid resiliency list --project my-project
Inspect a policy, including its current App ID scopes:
diagrid resiliency get order-app-resiliency --project my-project -o yaml
To change a policy, edit the manifest and re-apply it — the scopes list controls which App IDs it binds to:
diagrid resiliency update -f order-app-resiliency.yaml --project my-project
Delete a policy to remove its behaviour, which returns its targets to the Catalyst defaults:
diagrid resiliency delete order-app-resiliency --project my-project
Next steps
- Lock down who can call your apps with workflow access and service invocation policies
- Read the Resiliency policies concept for retry shapes, breaker states, and composition with workflow retries
- See the
Resiliencymanifest reference for every field and the defaults Catalyst applies