Production Planning
A production installation of Catalyst Enterprise Self-Hosted runs on a dedicated Kubernetes cluster with access to external PostgreSQL and, optionally, Kafka. Use the guidance below to size your infrastructure for the expected workload.
Cluster requirements
- Kubernetes 1.24 or later, installed in a dedicated cluster.
- A CNI that supports
NetworkPolicyresources. - At least 3 worker nodes for production, to support High Availability of the system components.
- Outbound connectivity to the Diagrid Cloud endpoints listed in Architecture.
Reference sizing profiles
The following profiles are verified minimum configurations for running Catalyst. Pick the profile that best matches your workload; individual components can be scaled independently from there.
| Profile | Use case | Kubernetes worker nodes | AWS equivalent | Workflow PostgreSQL | AWS equivalent |
|---|---|---|---|---|---|
| Dev / PoC | Evaluation and development. Not suitable for production. | 2 × (2 vCPU, 4 GiB) | c5.large | 2 vCPU, 4 GiB | db.t3.medium |
| Small production | Low-volume workflows. | 3 × (8 vCPU, 16 GiB) | c5.2xlarge | 4 vCPU, 16 GiB | db.m5.xlarge |
| Large production | High-volume workflows with independent scheduler state. | 3 or more × (8 vCPU, 16 GiB) | c5.2xlarge | 16 vCPU, 64 GiB | db.m5.4xlarge |
For the Large profile, we recommend a separate PostgreSQL instance for the Dapr Scheduler (8 vCPU / 32 GiB, e.g. AWS db.m5.2xlarge) so that scheduler state does not contend with workflow state.
Component resource footprint
The following are the default resource requests and limits shipped with the Catalyst Helm chart. Use them to validate that your node pool has sufficient capacity. All values can be tuned via Helm; refer to the Helm Reference for the full set of options.
| Component | Replicas | CPU (request / limit) | Memory (request / limit) |
|---|---|---|---|
| Agent | 1 | 40m / — | 500Mi / 1200Mi |
| Management | 2 | 40m / — | 100Mi / 1200Mi |
| Gateway (Envoy) | 1 (2 with HA) | 100m / 1000m | 512Mi / 2048Mi |
| Gateway (Control Plane) | 1 (2 with HA) | 50m / 100m | 50Mi / 100Mi |
| Identity Injector | 1 | 50m / 200m | 64Mi / 128Mi |
| Dapr Server (per App ID) | per app | 10m / 300m | 25Mi / 256Mi |
| OpenTelemetry Collector (metrics) | per project | 75m / — | 500Mi / 1100Mi |
| OpenTelemetry Collector (logs) | per project | 50m / — | 500Mi / 750Mi |
| Dapr Scheduler | 1 (3 with HA) | — / — | 130Mi / 175Mi |
| Dapr Sentry | 1 | — / — | — / 100Mi |
Dapr Servers and OpenTelemetry Collectors are provisioned per App ID and Project at runtime, so their aggregate footprint scales with the number of Apps and Projects you deploy.
Scale limits
Each Catalyst installation enforces the following internal limits to prevent resource exhaustion:
| Limit | Default | Helm value |
|---|---|---|
| Projects per installation | 50 | agent.config.placement.max_project_count |
| App IDs per installation | 300 | agent.config.placement.max_appid_count |
High availability
We recommend enabling High Availability for production installations. Set gateway.ha.enabled: true in your Helm values to run two replicas of the Gateway Envoy and Control Plane. The Management service runs two replicas by default. See the Helm Reference for the full set of HA-related options.
We also recommend running external dependencies in a Multi-AZ configuration:
- Workflow PostgreSQL.
- Dapr Scheduler PostgreSQL, when using the Large profile.
- Kafka, when Managed Pub/Sub is enabled.
Refer to the Helm Reference for configuration of external PostgreSQL and Kafka instances.
Private container images
If you are installing in an environment without access to public container registries or prefer to use your own container registry, you can pull the artifacts from our public registry, re-tag them, and push them to your private registry. Then, you can configure the Helm chart to use your private registry by setting the appropriate values. See the Helm Reference for the chart values.
We have provided a script and documentation on how to achieve this in the Catalyst Enterprise Self-Hosted Helm Chart repository.
Next steps
- AWS Deployment — reference architecture on AWS (VPC, EKS, RDS, Bastion host).
- Azure Deployment — reference architecture on Azure (VNet, AKS, Azure Firewall, management VM).