Skip to main content

AI Agents

Agent frameworks like CrewAI, LangGraph, and others give you tools, LLM orchestration, and prompt management — but none of them handle what happens when things go wrong in production. Some offer basic checkpointing, but you still need to detect failures at scale, build your own recovery mechanisms, and coordinate resumption across instances to avoid duplicate runs.

Catalyst adds the missing infrastructure: automatic failure detection, automatic recovery at scale, and multi-instance agent coordination. Your agent code stays the same — Catalyst handles the rest.

Pick your framework below to build a durable agent connected to Catalyst Cloud.

note

Catalyst Cloud is free and the fastest way to get started — no infrastructure to set up. For production or on-premises requirements, Diagrid also offers self-hosted enterprise deployments.

Prerequisites

1. Log in to Catalyst

diagrid login

Confirm your identity:

diagrid whoami

2. Clone and Navigate

git clone https://github.com/diagridio/catalyst-quickstarts.git

Navigate to the quickstart directory for your framework:

cd catalyst-quickstarts/agents/dapr-agents/durable-agent

3. Explore the Code

Invitations Manager — a durable agent that sends event invitations to guests via email and physical mail. Dapr Agents is the native AI agent framework built on Dapr — durability, state, and pub/sub are built into the agent itself.

Open main.py. The agent uses Pydantic models for structured tool input and output:

from dapr_agents import tool, DurableAgent
from dapr_agents.llm import DaprChatClient

class InvitationSchema(BaseModel):
guest_count: int = Field(description="Number of guests to invite")
event_type: str = Field(description="Type of event")

@tool(args_model=InvitationSchema)
def send_invitations(guest_count: int, event_type: str) -> List[InvitationResult]:
"""Send event invitations to guests."""
return [
InvitationResult(sent=int(guest_count * 0.7), method="email"),
InvitationResult(sent=int(guest_count * 0.3), method="physical mail"),
]

The DurableAgent class brings everything together — memory, state, registry, and pub/sub are all configured at the agent level:

agent = DurableAgent(
name="invitations-manager",
role="Invitations Manager",
goal="Send event invitations to guests using the send_invitations tool.",
tools=[send_invitations],
llm=DaprChatClient(component_name="llm-provider"),
memory=AgentMemoryConfig(
store=ConversationDaprStateMemory(store_name="agent-workflow")
),
state=AgentStateConfig(
store=StateStoreService(store_name="agent-memory"),
),
registry=AgentRegistryConfig(
store=StateStoreService(store_name="agent-registry"),
),
pubsub=AgentPubSubConfig(
pubsub_name="agent-pubsub",
agent_topic="events.invitations.requests",
broadcast_topic="agents.broadcast",
),
)

runner = AgentRunner()
runner.serve(agent, port=8006)

The LLM is configured via DaprChatClient(component_name="llm-provider") — a Dapr component in resources/llm-provider.yaml that references your OpenAI API key:

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
name: llm-provider
spec:
type: conversation.openai
metadata:
- name: key
value: "{{OPENAI_API_KEY}}"
- name: model
value: gpt-4.1-2025-04-14
tip

Unlike other frameworks, Dapr Agents has durability, state, pub/sub, and failure recovery built in natively. Automatic failure detection, crash recovery, and multi-instance coordination are all handled out of the box — no wrapper needed.

4. Configure API Key

This quickstart uses OpenAI as the LLM provider (or Google Gemini for ADK). Catalyst is LLM-agnostic — you're free to use any provider supported by your chosen framework.

export OPENAI_API_KEY="your-openai-api-key"

If you selected Google ADK, set the Google API key instead:

export GOOGLE_API_KEY="your-google-api-key"

5. Install Dependencies

uv venv
uv pip install -r requirements.txt

6. Run with Catalyst Cloud

diagrid dev run -f dev-python-durable-agent.yaml --project durable-agent-qs --approve
tip

diagrid dev run runs your code locally and connects it to the Catalyst Cloud workflow engine. Your agent code never leaves your machine — only workflow state is stored in Catalyst.

Wait for the log output indicating the runner is ready before proceeding.

7. Interact with the Agent

Open a new terminal and trigger the agent:

curl -X POST http://localhost:8006/run \
-H "Content-Type: application/json" \
-d '{"task": "Send invitations to 100 guests for a corporate networking event"}'

Expected output:

== APP == Invitations sent: 70 via email, 30 via physical mail

8. Crash Recovery

Stop the running application with Ctrl+C.

The quickstart repository includes a crash_test.py file that demonstrates crash recovery. It defines a 3-step pipeline where step 2 crashes with os._exit(1). After the crash, you comment out the crash line and restart — the workflow resumes from step 2 without re-executing step 1.

Remember: your code runs locally throughout this test. The Catalyst Cloud workflow engine — not your machine — tracks which steps completed and stores their results. That's what makes recovery possible even after a full process crash.

Crash recovery is built into Dapr Agents natively — the DurableAgent class automatically persists each tool execution as a workflow activity. If the process crashes, it resumes from the last saved state. See the Durable Agent Quickstart for a detailed walkthrough.

tip

You do not need to curl again — the existing workflow resumes automatically when your local process reconnects to Catalyst. Because workflow state is stored remotely in Catalyst (not in your process), the engine replays saved results instead of re-executing completed steps.

9. View in the Catalyst Web Console

Open the Catalyst Cloud web console and navigate to the Workflows section. Select the workflow instance to inspect the full execution trace, including tool calls and state persistence.

10. Clean Up

Stop the running application with Ctrl+C, then delete the Catalyst project:

diagrid project delete durable-agent-qs

Summary

In this quickstart, you:

  • Built a Dapr Agents durable agent with structured tool schemas and Dapr-native LLM configuration
  • Ran it locally connected to Catalyst Cloud for state persistence and crash recovery
  • Triggered the agent via REST API and inspected execution in the Catalyst console

Next Steps