Skip to main content

Production Planning

A production installation of Catalyst Enterprise Self-Hosted runs on a dedicated Kubernetes cluster with access to external PostgreSQL and, optionally, Kafka. Use the guidance below to size your infrastructure for the expected workload.

Cluster requirements

  • Kubernetes 1.24 or later, installed in a dedicated cluster.
  • A CNI that supports NetworkPolicy resources.
  • At least 3 worker nodes for production, to support High Availability of the system components.
  • Outbound connectivity to the Diagrid Cloud endpoints listed in Architecture.

Reference sizing profiles

The following profiles are verified minimum configurations for running Catalyst. Pick the profile that best matches your workload; individual components can be scaled independently from there.

ProfileUse caseKubernetes worker nodesAWS equivalentWorkflow PostgreSQLAWS equivalent
Dev / PoCEvaluation and development. Not suitable for production.2 × (2 vCPU, 4 GiB)c5.large2 vCPU, 4 GiBdb.t3.medium
Small productionLow-volume workflows.3 × (8 vCPU, 16 GiB)c5.2xlarge4 vCPU, 16 GiBdb.m5.xlarge
Large productionHigh-volume workflows with independent scheduler state.3 or more × (8 vCPU, 16 GiB)c5.2xlarge16 vCPU, 64 GiBdb.m5.4xlarge

For the Large profile, we recommend a separate PostgreSQL instance for the Dapr Scheduler (8 vCPU / 32 GiB, e.g. AWS db.m5.2xlarge) so that scheduler state does not contend with workflow state.

Component resource footprint

The following are the default resource requests and limits shipped with the Catalyst Helm chart. Use them to validate that your node pool has sufficient capacity. All values can be tuned via Helm; refer to the Helm Reference for the full set of options.

ComponentReplicasCPU (request / limit)Memory (request / limit)
Agent140m / —500Mi / 1200Mi
Management240m / —100Mi / 1200Mi
Gateway (Envoy)1 (2 with HA)100m / 1000m512Mi / 2048Mi
Gateway (Control Plane)1 (2 with HA)50m / 100m50Mi / 100Mi
Identity Injector150m / 200m64Mi / 128Mi
Dapr Server (per App ID)per app10m / 300m25Mi / 256Mi
OpenTelemetry Collector (metrics)per project75m / —500Mi / 1100Mi
OpenTelemetry Collector (logs)per project50m / —500Mi / 750Mi
Dapr Scheduler1 (3 with HA)— / —130Mi / 175Mi
Dapr Sentry1— / —— / 100Mi

Dapr Servers and OpenTelemetry Collectors are provisioned per App ID and Project at runtime, so their aggregate footprint scales with the number of Apps and Projects you deploy.

Scale limits

Each Catalyst installation enforces the following internal limits to prevent resource exhaustion:

LimitDefaultHelm value
Projects per installation50agent.config.placement.max_project_count
App IDs per installation300agent.config.placement.max_appid_count

High availability

We recommend enabling High Availability for production installations. Set gateway.ha.enabled: true in your Helm values to run two replicas of the Gateway Envoy and Control Plane. The Management service runs two replicas by default. See the Helm Reference for the full set of HA-related options.

We also recommend running external dependencies in a Multi-AZ configuration:

  • Workflow PostgreSQL.
  • Dapr Scheduler PostgreSQL, when using the Large profile.
  • Kafka, when Managed Pub/Sub is enabled.

Refer to the Helm Reference for configuration of external PostgreSQL and Kafka instances.

Private container images

If you are installing in an environment without access to public container registries or prefer to use your own container registry, you can pull the artifacts from our public registry, re-tag them, and push them to your private registry. Then, you can configure the Helm chart to use your private registry by setting the appropriate values. See the Helm Reference for the chart values.

We have provided a script and documentation on how to achieve this in the Catalyst Enterprise Self-Hosted Helm Chart repository.

Next steps

  • AWS Deployment — reference architecture on AWS (VPC, EKS, RDS, Bastion host).
  • Azure Deployment — reference architecture on Azure (VNet, AKS, Azure Firewall, management VM).