Apply resiliency policies to your apps

A resiliency policy declares how Catalyst recovers when an outbound call fails — without changing application code. You define named timeout, retry, and circuit-breaker behaviours in a Resiliency resource, scope it to one or more App IDs, and point each target (another app, or a component such as a state store or pub/sub broker) at the policies it should use. Catalyst enforces them inline on every matching call.

For the concepts behind retry shapes, breaker states, and how these compose with workflow retries, see Resiliency policies.

How resiliency policies work

A Resiliency resource has two parts:

spec.policies — named timeouts, retries, and circuitBreakers definitions.
spec.targets — bind those named policies to outbound calls, either to an app (apps.<app-id>) or a component (components.<name>).

The top-level scopes list names the App IDs the policy applies to. If you omit a dimension on a target, Catalyst applies a default: a 10s timeout, a constant retry of 5 attempts on HTTP 500–504, and a breaker that trips after 5 consecutive failures.

Timeouts

A timeout caps how long Catalyst waits for an outbound call before it terminates the call. Define named durations under policies.timeouts:

spec:
  policies:
    timeouts:
      fast: 2s
      relaxed: 30s

Retries

A retry policy reattempts a failed call. Use constant for a fixed wait between attempts, or exponential to back off and give an overloaded destination time to recover:

spec:
  policies:
    retries:
      quickRetry:
        policy: constant
        duration: 1s
        maxRetries: 3
      backoff:
        policy: exponential
        duration: 200ms
        maxInterval: 10s
        maxRetries: 5
        matching:
          httpStatusCodes: "503,504"

Set maxRetries: -1 to retry indefinitely. The matching block limits retries to transient status codes; without it, Catalyst retries HTTP 500–504 and gRPC 14,4,8 by default. Don't retry permanent errors like 400 — they never succeed.

Circuit breakers

A circuit breaker stops sending calls to a target that keeps failing, so retries don't pile load onto a struggling destination. It trips when a condition you define crosses a threshold:

spec:
  policies:
    circuitBreakers:
      breaker:
        maxRequests: 1
        interval: 30s
        timeout: 60s
        trip: "consecutiveFailures > 5"

trip is a CEL expression over counters such as consecutiveFailures, totalFailures, and requests. When the breaker trips (opens), calls fail fast for timeout; it then allows maxRequests probe calls to test whether the destination has recovered.

Bind the policies and apply

Reference the named policies from spec.targets, scope the resource to the App ID making the calls, and apply it with the Diagrid CLI:

# order-app-resiliency.yaml
apiVersion: dapr.io/v1alpha1
kind: Resiliency
metadata:
  name: order-app-resiliency
scopes:
  - order-app
spec:
  policies:
    timeouts:
      fast: 2s
    retries:
      backoff:
        policy: exponential
        duration: 200ms
        maxInterval: 10s
        maxRetries: 5
        matching:
          httpStatusCodes: "503,504"
    circuitBreakers:
      breaker:
        maxRequests: 1
        interval: 30s
        timeout: 60s
        trip: "consecutiveFailures > 5"
  targets:
    apps:
      inventory-app:
        timeout: fast
        retry: backoff
        circuitBreaker: breaker
    components:
      orders-statestore:
        outbound:
          timeout: fast
          retry: backoff

diagrid resiliency create -f order-app-resiliency.yaml --project my-project

This binds the policies to order-app: its service invocations to inventory-app and its writes to the orders-statestore component now time out after 2 seconds, retry transient failures with exponential backoff, and trip a breaker after five consecutive failures. For components, outbound covers calls the app makes to the component, while inbound covers deliveries from it, such as pub/sub messages.

note

When the calling app runs Catalyst Workflows, a workflow activity's own retry policy stacks on top of the resiliency policy — if both retry the same failure, the effective attempts multiply. Keep resiliency retries short for infrastructure transients and let workflow activity retries handle business-level outcomes. See composition with workflow activity retries.

Manage policies

List the resiliency policies in a project:

diagrid resiliency list --project my-project

Inspect a policy, including its current App ID scopes:

diagrid resiliency get order-app-resiliency --project my-project -o yaml

To change a policy, edit the manifest and re-apply it — the scopes list controls which App IDs it binds to:

diagrid resiliency update -f order-app-resiliency.yaml --project my-project

Delete a policy to remove its behaviour, which returns its targets to the Catalyst defaults:

diagrid resiliency delete order-app-resiliency --project my-project

Next steps

Lock down who can call your apps with workflow access and service invocation policies
Read the Resiliency policies concept for retry shapes, breaker states, and composition with workflow retries
See the Resiliency manifest reference for every field and the defaults Catalyst applies

How resiliency policies work​

Timeouts​

Retries​

Circuit breakers​

Bind the policies and apply​

Manage policies​

Next steps​