Troubleshooting and FAQ

Common issues and their fixes. For runtime issues with a running workflow or agent, also check the Workflows and Agents console pages — most surface the failure inline. For the conceptual model behind observability, see Observability.

`diagrid login` doesn't open a browser

Symptom: Login appears to hang, and you see:

WARNING: error opening browser: exec: "xdg-open,x-www-browser,www-browser": executable file not found in $PATH

Cause: No default browser is available to auto-open the device-confirmation page. This is common over SSH, in CI, and inside dev containers.

Fix: The CLI prints a fallback URL and a user code. On any machine with a browser, visit https://login.diagrid.io/activate, enter the code shown in your terminal, and confirm it matches. Authentication completes as soon as the code is confirmed — no browser on the CLI host is required.

See the Diagrid CLI reference for the full command surface.

Connecting an app to Catalyst

Health check fails with an SSL certificate error

Symptom: Your app can't establish connectivity and the Dapr health check times out:

[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate
TimeoutError: Dapr health check timed out, after 60.0.

Cause: Some runtimes — Python in particular — don't read the operating-system certificate store, so they can't verify the TLS chain on your project endpoint. The 60-second health-check timeout is a downstream symptom, not the root cause.

Fix: Point the runtime at a valid CA bundle. For Python, install certifi and set the certificate environment variable to its path — python -m certifi prints the location:

export SSL_CERT_FILE="$(python -m certifi)"
export REQUESTS_CA_BUNDLE="$(python -m certifi)"

`dapr-api-token header is missing or invalid`

Symptom: Requests from your app are rejected with dapr-api-token header is missing or invalid.

Cause: The app isn't presenting the project API token, so Catalyst refuses the connection.

Fix: Ensure DAPR_API_TOKEN is set in the app's environment. When running locally, this token is supplied through your generated dev configuration — re-run your diagrid dev scaffold so it's injected, or set DAPR_API_TOKEN explicitly before starting the app.

No App ID exists to connect to

Symptom: You're ready to connect your app but there's nothing to point it at — diagrid appid list returns an empty list, and commands or scaffolds that expect an App ID (such as diagrid dev scaffold) have nothing to resolve connection details against.

Cause: An App ID is the identity your app authenticates as and the entry point Catalyst routes traffic through. Until you create one in your project, there's no endpoint or API token to connect with.

Fix: Create an App ID, then generate your dev configuration against it:

diagrid appid create my-app

Confirm it's ready with diagrid appid list, then run your diagrid dev scaffold so the endpoint and DAPR_API_TOKEN are injected into your app's environment. See diagrid appid create and Manage App IDs.

See Local development for the dev-loop tools.

Workflow runtime

A workflow is stuck in Running and won't progress

Symptom: A workflow sits in Running — often with an activity that never advances past its scheduled state — and console or CLI actions against it (terminate, rerun, pause, resume, raise event) seem to do nothing or return an API error.

Cause: The workflow worker application — the App ID that hosts your orchestrator and activity code — isn't running or connected. Catalyst's managed workflow engine records and schedules the work, but activities only execute while your worker is connected to pick them up, and management actions require a running worker to take effect.

Fix: Confirm the worker App ID is up and connected, and redeploy or restart it if not. The Workflows console shows the stalled step and its last recorded event. Once the worker is healthy, in-flight instances resume on their own; if one stays wedged, terminate it and diagrid workflow rerun.

A code change breaks in-flight workflows (non-determinism error)

Symptom: After deploying a new build, existing in-flight workflows fail — typically with a non-determinism error — while brand-new instances run fine.

Cause: Catalyst Workflows rebuild state by replaying history through your orchestrator code. If your change altered a call that in-flight instances have already recorded — reordering activities; adding, removing, or renaming an already-scheduled activity; changing the parameters of an existing activity call; or changing the duration of a timer already set — replay can no longer reconcile the code with the history, and the instance fails.

Fix: Deploy only replay-safe (additive) changes against running instances. For unsafe changes, use a migration strategy — a version gate or a new workflow name — and clear any stragglers with diagrid workflow terminate. See Workflow versioning for the full replay-safety rules and migration strategies.

A long-running activity fails with `DEADLINE_EXCEEDED`

Symptom: An activity — commonly an LLM call in an agent workflow — fails with StatusCode.DEADLINE_EXCEEDED / "Deadline Exceeded", and the workflow then schedules a retry of that activity.

Cause: The activity didn't return within its execution deadline. The engine retries it per the activity's retry policy — expected durable-execution behavior — but a non-idempotent activity can repeat its side effects on each attempt.

Fix: Keep individual activities within the deadline by splitting long work into smaller activities (for agent calls, trim the prompt or context that's inflating call duration), and make activities idempotent so retries are safe. Set an appropriate retry policy on the activity call in your SDK.

A burst of workflow starts overwhelms the worker

Symptom: Many workflows start at once (for example, one per incoming request); activities queue up and throughput degrades, even though no single instance has failed.

Cause: By default, workflow and activity invocations are unbounded — a spike of starts runs to the full parallelism the worker and backing store can sustain.

Fix: Bound concurrency with a workflow Configuration policy. Set maxConcurrentWorkflowInvocations and maxConcurrentActivityInvocations (enforced per sidecar; default unbounded) in a Configuration manifest and apply it with diagrid configuration create -f <file> --project <project>. See the Policies reference for the manifest shape and a worked example.

Agent runtime

An agent's LLM call times out, errors, or returns a truncated response

Symptom: An agent stalls, fails, or produces an incomplete answer, and it's unclear whether the model call is at fault.

Cause: Agent model calls go through the Catalyst Conversation API. A failed, slow, or truncated call surfaces there — not in your application logs.

Fix: Open API Logs and filter Dapr API = conversation (add Status = failure) for the agent's App ID. The detail panel shows the status, error message, end-to-end Execution time, and token counts. Sort by Execution time to surface the slowest calls; a small response paired with a large Completion tokens count points to truncation or an early stop rather than a hang. From the Agents page, an agent's Model configuration panel jumps straight to that agent's Conversation API calls.

Stub — populate. Remaining symptoms:

Agent loops on same tool call → tool returning non-deterministic output, see Agent patterns

Memory / session not persisting → durable-agent configuration check

See Operate AI agents for inspecting running agents.

Components and managed services

`max number of connections reached`

Symptom: Creating or applying a component fails with:

Failed processing component "<name>": max number of connections reached, current 10 max 10

Cause: Component (infrastructure connection) limits are enforced per organization, not per project. A new or empty project still counts against the org-wide total, so you can hit the cap even when the current project has few components.

Fix: Review component usage across all projects in the organization — not just the one you're working in — and remove any that are unused, or upgrade your plan to raise the cap.

See the Components reference.

MCP

Requests to an MCP server are rejected with `401 Unauthorized`

Symptom: Calls to Catalyst's MCP endpoint (/v1.0/diagrid/mcp/<name>) return 401 Unauthorized.

Cause: The caller didn't present a valid App ID API token. Catalyst authenticates every MCP request by the dapr-api-token header before applying the access policy.

Fix: Set dapr-api-token: <App ID API token> on the request. diagrid dev run injects DAPR_API_TOKEN for each app during local development; for a hosted project, read the App ID's token from the Catalyst console. If Catalyst reaches the server but the upstream server rejects the call, that surfaces as an in-band JSON-RPC error from your MCP client — check the upstream credential configured on the MCPServer connection. See MCP authentication.

An MCP request fails with `403` / `ACCESS_DENIED`, or no tools are discovered

Symptom: A client can't use a tool — it isn't listed by tools/list, or calling it returns 403 Forbidden. With a full deny-all policy the MCP session itself is refused (Session terminated).

Cause: The MCP server's access policy denied the request. Every MCP server is deny-by-default, so a newly created server denies all callers and tools until you grant access. Catalyst filters discovery to authorized tools and rejects unauthorized calls at its data plane before they reach the upstream server.

Fix: Grant the calling App ID access to the tools it needs, then retry (policy changes take a few seconds to roll out):

diagrid mcpserver access grant <mcp-server> --caller <app-id> --allow-tools <tools> --wait

Confirm the verdict instantly with diagrid mcpserver access test <mcp-server> --caller <app-id> --tool <tool>. See Control tool access.

Where to get help

Diagrid Discord for community help
Plans & Support for paid-tier support channels

Install and login​

diagrid login doesn't open a browser​

Connecting an app to Catalyst​

Health check fails with an SSL certificate error​

dapr-api-token header is missing or invalid​

No App ID exists to connect to​

Workflow runtime​

A workflow is stuck in Running and won't progress​

A code change breaks in-flight workflows (non-determinism error)​

A long-running activity fails with DEADLINE_EXCEEDED​

A burst of workflow starts overwhelms the worker​

Agent runtime​

An agent's LLM call times out, errors, or returns a truncated response​

Components and managed services​

max number of connections reached​

MCP​

Requests to an MCP server are rejected with 401 Unauthorized​

An MCP request fails with 403 / ACCESS_DENIED, or no tools are discovered​

Where to get help​

Install and login