Skip to main content

Quickstart: CrewAI Durable Workflows

Coming Soon

This quickstart is coming soon. Stay tuned!

Run CrewAI agents with durable execution using Dapr Workflows. Each tool call becomes a checkpoint — if your agent crashes, it resumes exactly where it left off without re-executing completed work.


What You'll Build

A CrewAI agent that:

  • Survives crashes — Workflow state is checkpointed after each tool execution
  • Retries on failure — Failed LLM calls and tools retry with exponential backoff
  • Resumes from checkpoint — Restart the app and pick up where you left off

Why Durable Agent Execution?

Long-running agents are fragile. Network errors, API rate limits, or process restarts can lose hours of work. Dapr Workflows make your agents production-ready:

  • Automatic checkpointing — Every tool call is persisted before execution
  • Crash recovery — On restart, the workflow replays from the last checkpoint
  • Built-in retries — Configurable retry policies handle transient failures
  • Observability — Track workflow state and debug failures

Architecture

┌─────────────────────────────────────────────────────────────┐
│ Dapr Workflow Runtime │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ CrewAI Agent Workflow │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ LLM Call │────▶│Tool Exec │────▶│ LLM Call │ │ │
│ │ │(Activity)│ │(Activity)│ │(Activity)│ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ │ │ │ │ │
│ │ ▼ ▼ ▼ │ │
│ │ [Checkpoint] [Checkpoint] [Checkpoint] │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Each LLM call and tool execution runs as a workflow activity. If the process crashes after any checkpoint, the workflow resumes from that point — previous activities won't re-execute.


Prerequisites


Setup

1

Create Project Structure

Create a new project directory:

mkdir crewai-durable && cd crewai-durable
mkdir -p components

Your project structure:

crewai-durable/
├── components/
│ └── statestore.yaml
├── agent.py
└── requirements.txt
2

Configure Dapr State Store

Create components/statestore.yaml:

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
name: statestore
spec:
type: state.redis
version: v1
metadata:
- name: redisHost
value: localhost:6379
- name: redisPassword
value: ""
- name: actorStateStore
value: "true"

This uses Redis for workflow state persistence. Dapr initializes Redis automatically with dapr init.

3

Install Dependencies

Create requirements.txt:

dapr-ext-crewai>=0.1.0
crewai>=0.28.0

Set up a virtual environment and install:

python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
4

Set Your API Key

export OPENAI_API_KEY="your-api-key-here"

Or for other providers (see LiteLLM docs):

export ANTHROPIC_API_KEY="your-key"  # For Claude
export GEMINI_API_KEY="your-key" # For Gemini

How It Works

The DaprWorkflowAgentRunner wraps your CrewAI agent in a Dapr workflow:

  1. Agent configuration is serialized and stored in workflow state
  2. Each LLM call runs as a workflow activity with checkpointing
  3. Each tool execution runs as a separate activity
  4. On crash, the workflow replays from the last checkpoint — already-completed activities return cached results

This means:

  • LLM API calls that succeeded won't be repeated
  • Tool executions that completed won't re-run
  • You only pay for the work that actually needs to be done

Build the Agent

1

Create the Durable Agent

Create agent.py:

import asyncio
import os
from crewai import Agent, Task
from crewai.tools import tool
from dapr.ext.crewai import DaprWorkflowAgentRunner


# Define tools using CrewAI's @tool decorator
@tool("Get the current weather for a city")
def get_weather(city: str) -> str:
"""Get the current weather for a specified city."""
# In production, call a real weather API
weather_data = {
"Tokyo": "Sunny, 22°C",
"London": "Cloudy, 15°C",
"New York": "Partly cloudy, 18°C",
"Paris": "Rainy, 12°C",
}
return weather_data.get(city, f"Weather data not available for {city}")


@tool("Search for information on the web")
def search_web(query: str) -> str:
"""Search the web for information."""
# In production, call a real search API
return f"Search results for '{query}': Found relevant information about {query}."


@tool("Get the current date and time")
def get_datetime() -> str:
"""Get the current date and time."""
from datetime import datetime
return datetime.now().strftime("%Y-%m-%d %H:%M:%S")


async def main():
# Create a CrewAI agent with tools
agent = Agent(
role="Research Assistant",
goal="Help users find accurate and up-to-date information",
backstory="""You are an expert research assistant with access to
various information sources. You excel at finding and synthesizing
information to provide comprehensive answers.""",
tools=[get_weather, search_web, get_datetime],
llm=os.getenv("CREWAI_LLM", "openai/gpt-4o-mini"),
verbose=True,
)

# Define a task
task = Task(
description="""Find out the current weather in Tokyo and search for
recent news about AI developments. Provide a brief summary.""",
expected_output="""A summary containing:
1. Current weather in Tokyo
2. Key recent AI news highlights""",
agent=agent,
)

# Create the Dapr Workflow runner
runner = DaprWorkflowAgentRunner(
agent=agent,
max_iterations=10,
)

try:
# Start the workflow runtime
print("Starting Dapr Workflow runtime...")
runner.start()

# Run the agent
session_id = "demo-session-001"
print(f"\nExecuting agent task with session: {session_id}")
print("=" * 60)

async for event in runner.run_async(task=task, session_id=session_id):
event_type = event["type"]

if event_type == "workflow_started":
print(f"Workflow started: {event.get('workflow_id')}")

elif event_type == "workflow_status_changed":
print(f"Status: {event.get('status')}")

elif event_type == "workflow_completed":
print("\n" + "=" * 60)
print("AGENT COMPLETED")
print("=" * 60)
print(f"Iterations: {event.get('iterations')}")
print(f"\nFinal Response:\n{event.get('final_response')}")

elif event_type == "workflow_failed":
print(f"Workflow FAILED: {event.get('error')}")

finally:
print("\nShutting down...")
runner.shutdown()


if __name__ == "__main__":
asyncio.run(main())

Run the Agent

1

Start with Dapr

Run the agent with Dapr:

dapr run --app-id crewai-agent \
--dapr-grpc-port 50001 \
--resources-path ./components \
-- python agent.py

You'll see:

  1. Dapr initializing the workflow runtime
  2. The agent executing tool calls (each as a durable activity)
  3. The final response
2

Test Crash Recovery

To see durability in action, try these scenarios:

Scenario 1: Normal completion

  • Run the agent and let it complete
  • Note the workflow ID in the output

Scenario 2: Crash and resume

  1. Start the agent
  2. While it's running, press Ctrl+C to simulate a crash
  3. Restart the agent with the same command
  4. The workflow will resume from the last checkpoint

Key Concepts

ConceptDescription
WorkflowA durable orchestration that survives restarts
ActivityA single unit of work (LLM call or tool execution) that can be retried
CheckpointAutomatic state persistence after each activity completes
Retry PolicyBuilt-in retry with exponential backoff (3 attempts, 1s to 30s)

Next Steps


Clean Up

Stop the Dapr application:

dapr stop --app-id crewai-agent