Skip to main content

AI Agents

Build intelligent, autonomous AI agents that can interact with external systems, make decisions, and orchestrate complex workflows using durable execution patterns and pluggable infrastructure.

From Development to Production

While Dapr Agents provides the powerful framework for building intelligent agents, Catalyst takes you from code to production by providing an enterprise-grade platform for running agents, giving you full observability and security out of the box. Learn more about Running Agents on Catalyst.

Durable Agents

The Dapr Agents framework enables durable agents that persist their execution state across restarts and failures. Unlike traditional agents that lose context when they crash, durable agents can resume exactly where they left off.

durable agent overview

Below is an example of an order processing agent that uses tools to process orders and makes decisions on approval workflows:

async def main():
order_processor = DurableAgent(
name="OrderProcessingAgent",
role="Order Processing Specialist",
goal="Process customer orders with appropriate approval workflows",
instructions=[
"You are an order processing specialist that handles customer orders.",
"For orders under $1000, automatically approve them.",
"For orders $1000 or more, escalate them for manual approval.",
"Use the process_order tool to handle order processing.",
"Provide clear status updates to customers."
],
tools=[process_order],

# Conversation history persistence
memory=ConversationDaprStateMemory(
store_name="statestore",
session_id="customer-session-123"
),

# LLM provider via Dapr Conversation API
llm=DaprChatClient(
component_name="openai-gpt-4o",
),

# Execution state for durability and recovery
state_store_name="statestore",
state_key="execution-orders",
)

Key capabilities:

  • Durable executions - Every step in the agent's reasoning and execution is automatically saved, allowing recovery from failures without losing progress or repeating expensive LLM calls
  • Built-in resiliency - Automatically retry failed operations and recover from transient failures in external systems or LLM APIs
Why Durability Matters

Traditional agents lose all context when they crash. If an agent fails after 10 LLM calls and 5 API interactions, you have to start over—repeating expensive operations and potentially getting different results. Durable agents checkpoint their state automatically, so they resume exactly where they left off.

Pluggable Memory and Context

AI agents need to retain context across interactions to provide coherent and adaptive responses. Dapr Agents provides a pluggable memory architecture that allows agents to store conversation history in any Dapr state store.

memory = ConversationDaprStateMemory(
store_name="statestore", # Maps to a configured Dapr state store component
session_id="customer-session-123"
)

Rather than being limited to volatile in-memory storage, agents can use any of the 28+ Dapr state store components as their persistent memory implementation, from Redis and PostgreSQL to AWS DynamoDB and Azure Cosmos DB.

Swappable LLM Providers

The Dapr Conversation API provides an abstraction layer for LLMs, enabling agents to switch between model providers without code changes. Conversation components handle the integration with different LLM providers and add capabilities like response caching, PII protection, and resilience.

from dapr_agents import DaprChatClient

llm = DaprChatClient(
component_name="openai-gpt-4o", # Maps to a configured Dapr conversation component
)

Key benefits:

  • Provider agnostic - Swap between OpenAI, Azure OpenAI, AWS Bedrock, Google Vertex AI, and more without changing agent code
  • Prompt caching - Reduce latency and costs for repeated calls
  • Security and PII obfuscation - Protect sensitive data automatically
  • Built-in resiliency - Retries, timeouts, and circuit breakers
  • Observability - OpenTelemetry tracing and Prometheus metrics out of the box

Multi-Agent Workflows

For complex business processes, you can orchestrate multiple agents within deterministic workflows. Dapr Agents integrates seamlessly with Dapr Workflow, letting you combine the intelligence of LLMs with the reliability and predictability of workflow orchestration.

multi-agent workflow

This is ideal for scenarios like customer support triage, multi-stage data processing, or any flow where you need conditional logic, parallelization, or human-in-the-loop between AI reasoning steps.

Here's an example of a customer support workflow that routes through two specialized agents based on business logic:

# --------- AGENTS ---------
triage_agent = Agent(
name="Triage Agent",
role="Customer Support Triage Assistant",
goal="Assess entitlement and urgency.",
instructions=[
"Determine whether the customer has entitlement.",
"Classify urgency as URGENT or NORMAL.",
"Return JSON with: entitlement, urgency.",
]
)

expert_agent = Agent(
name="Expert Agent",
role="Technical Troubleshooting Specialist",
goal="Diagnose issue and propose a resolution.",
instructions=[
"Use the provided customer context and issue description.",
"Summarize the resolution in a customer-friendly message.",
"Return JSON with: resolution, customer_message.",
]
)

# --------- WORKFLOW ---------
@workflow(name="customer_support_workflow")
def customer_support_workflow(ctx: DaprWorkflowContext, input_data: dict):
triage = yield ctx.call_activity(triage_activity, input=input_data)
if not triage.get("entitlement"):
return {"status": "rejected", "reason": "No entitlement"}
expert = yield ctx.call_activity(expert_activity, input=input_data)
return {"status": "completed", "result": expert}

@activity(name="triage_activity")
@agent_activity(agent=triage_agent)
def triage_activity(ctx) -> dict:
"""Customer: {name}. Issue: {issue}."""
pass

@activity(name="expert_activity")
@agent_activity(agent=expert_agent)
def expert_activity(ctx) -> dict:
"""Customer: {name}. Issue: {issue}."""
pass

This architectural pattern is ideal for business-critical applications where you need the intelligence of LLMs combined with the reliability and observability of deterministic workflows.

Common Orchestration Patterns

The following patterns demonstrate common scenarios for orchestrating AI agents within workflows:

Running AI Agents with Catalyst

Catalyst provides an enterprise-grade platform for running agents in production, giving you full observability and security out of the box.

Key capabilities:

  • Complete observability – Gain full visibility into each step of the agent's execution, including timing information, inputs, outputs, and the decision-making process at every stage
  • Agent identity – Catalyst assigns a unique identity using an x.509 certificate that controls its access rights to resources and ensures that it has a security boundary for what it can accomplish
  • Secure app-to-app (or agent-to-agent) communication – Short-lived mTLS certificates ensure encrypted traffic and automatic mutual authentication between applications, with continuous rotation reducing exposure from compromised keys
  • Authorization & access control – Identity-based policies define which agents may call one another, what operations each can perform, what infrastructure each can access, and prevent unauthorized access
  • Cross-cloud identity federation – Integrates with AWS, Azure and GCP identity providers so agents can authenticate to cloud services without storing or distributing long-lived secrets
  • Centralized credentials – When required, API keys (such as LLM credentials) are stored and managed at the platform layer rather than in application code, preventing credential sprawl and enabling secure, unified access to multiple LLM providers
  • Auditing & traceability – All operations are tied to workload identity, enabling clear end-to-end audit trails for debugging, compliance, and forensic analysis

Getting Started