Skip to main content

Workflow

Catalyst Workflows are long-running, stateful orchestrations with durable state that recover automatically from failures. When a process restarts, an activity fails, or a tool call times out, the workflow resumes from the exact point of failure instead of starting over — no partial work is repeated and no in-flight state is lost.

They are built on the open-source Dapr Workflow API, using the same programming model — your existing Dapr workflow code runs on Catalyst unchanged. This API offers durable state, automatic retries policies, timers, external events handling, and activity fan-out/fan-in. Author workflows in Python, JavaScript, .NET, Java, or Go with the same orchestrator semantics across all SDKs.

With Catalyst Workflows you can:

  • Coordinate parallel work and aggregate results without managing concurrency limits.
  • Schedule durable timers that survive process restarts — for polling loops or delayed steps.
  • Pause for human approvals or external events for seconds, days, or weeks.
  • Resume business processes from the exact point of failure across crashes, deploys, and region failovers.
  • Run AI agents whose reasoning steps, tool calls, and decisions all persist and replay.

What Catalyst adds over Dapr Workflows

Dapr Workflow gives you the authoring model and the runtime contract. Catalyst runs that contract as a managed service and adds the operational tooling around it:

  • Managed workflow runtime — no backing store to provision or scale. Workflow history is persisted in Catalyst's managed store.
  • Workflow operations dashboard — a unified view of every execution across a project: status, App ID, and workflow name. Drill into any execution to inspect its execution graph, step-level history, inputs, and outputs. See Operate workflows.
  • Execution control from the console and CLI — pause, resume, rerun, terminate, raise event, and purge instances from the UI or the diagrid workflow CLI.
  • Workflow Composer — an AI-powered service that turns BPMN-style diagrams into runnable Dapr Workflow projects in your language. See Workflow Composer.
  • Diagrid Dev Dashboard — inspect workflow executions, inputs, outputs, and history during local development, using the same UX as the production console.

Programming model

A workflow is orchestration code that schedules work and waits on results. The orchestration itself is deterministic and side-effect free — all side effects happen in activities. The runtime replays the orchestration on restart, using the recorded history to reconstruct in-memory state.

  • Workflow — the top-level orchestration function. It runs to completion (or remains durably suspended) across process restarts. Inputs and outputs are serialized to the backing store at every yield point.
  • Activity — a unit of work that performs a side effect: a database write, an HTTP call, an LLM invocation. Activities run at-least-once and are retried by the runtime according to the workflow's retry policy.
  • Child workflow — a workflow invoked from another workflow. Use child workflows to compose larger orchestrations from reusable sub-processes, isolate retry boundaries, or fan out work across application instances.
  • Durable timer — a sleep that survives process restarts. Use timers for scheduled steps, polling loops, and approval timeouts ranging from seconds to days.
  • External event — a named signal a workflow waits on. The workflow remains durably suspended until the event is raised — by another service, by the CLI, or from the console.

The same primitives are available across the .NET, Go, Java, JavaScript, and Python SDKs.

Lifecycle states

Every workflow execution moves through a small set of well-defined states. Three are terminal — once entered, the instance never transitions back. The runtime guarantees state transitions are atomic and durable: a state you observe is a state the runtime has already persisted.

  • RUNNING — the instance is actively executing, awaiting an activity, a timer, or an external event. Most long-running workflows spend almost all their wall-clock time in this state while durably suspended on a timer or event.
  • SUSPENDED — the instance has been paused by an operator. The runtime stops dispatching new work for it; in-flight activities still run to completion and their results are recorded, but no new activities are scheduled until it is resumed.
  • COMPLETED — the orchestrator function returned a value. The output is durably stored and available to callers.
  • FAILED — the orchestrator raised an unhandled exception. Activity-level failures that are caught and handled in orchestrator code do not surface as FAILED — only an exception that propagates out of the workflow itself.
  • TERMINATED — an operator explicitly ended the instance before completion. The orchestrator was not given a chance to clean up; any in-flight activity may still complete but its result is discarded.

After reaching a terminal state, the instance's history is retained until it is purged. See Manage workflow instances for runtime operations against each state.

Workflow vs saga

There are two common ways to coordinate multi-step work: sagas and workflows. They offer different properties when it comes to failure recovery and visibility.

A workflow centralizes the orchestration in code: branching, retries, timeouts, and compensation all live in one function you can read top-to-bottom.

A saga pushes that logic out to the participating services making ot harder to reason about it.

You can still implement the Saga pattern inside a workflow: model each step as an activity and add a compensating activity that runs on failure. The workflow gives you the durable state and central timeline a hand-rolled saga lacks. See Compensation in workflow patterns.

ApproachStateFailure recoveryVisibilityBest for
Saga (event-driven choreography)Spread across services and topicsEach service handles its own retries and compensationsDistributed traces. Must instrument all servicesLoosely coupled services that already communicate by events
Workflow (orchestration)Persisted by the runtimeReplays from the last checkpoint after any restartExecution graph, step history, inputs and outputsMulti-step business processes, agents

Next steps

For development concepts, tools, and language specific guides see Develop workflows.

For inspecting running workflows in the console, see Operate workflows.