Manage Clusters
exalsius uses the concept of clusters to orchestrate the node compute resources. Each cluster is a managed Kubernetes cluster, consisting of a control plane that is typically managed by the exalsius backend and a set of worker nodes that execute AI workloads. exalsius allows you to deploy clusters composed of nodes from your node pool (see Managing Nodes).
Deploying New Clusters
exalsius offers two primary methods for deploying clusters: an interactive command-line interface (CLI) flow that guides you through the process, and a CLI command option for direct, non-interactive configuration and automation.
Option 1 - Interactive Deployment
For a guided cluster setup, use the interactive deployment flow. It prompts you for all necessary information step-by-step.
To begin, run:
exls clusters deploy
Step 1 - Choose a Cluster Name
You’ll first be asked to enter a cluster name.
Step 2 - Select Available Nodes
These nodes are those previously added to your node pool (see Managing Nodes) You can select one or more worker nodes for your cluster.
Step 3 - Setup Cluster for Multi-Node AI Model Training
You can optionally enable multi-node AI training support, which prepares the cluster for distributed training workloads.
Tip
Enabling multi-node training ensures your cluster is configured with all necessary services for communication and workload scheduling.
Step 4 - Enable VPN between the Cluster Nodes
exalsius automatically sets up a P2P mesh VPN between all cluster nodes using Wireguard. This ensures secure, encrypted communication between nodes in and across different datacenters or networks.
Required Firewall Configuration
All nodes must have UDP port 51871 open to establish peer-to-peer VPN connections. Cluster deployment will fail if nodes cannot communicate through this port. Refer to the Firewall Configuration section for configuration details.
Option 2 - Manual Deployment
For automation scenarios, clusters can be created non-interactively by specifying all required parameters directly as CLI options. This approach is ideal for CI/CD pipelines, scripted deployments, or when you already know the exact configuration.
Example
To create a cluster, run the following command:
exls clusters deploy \
-- name "my-exalsius-cluster" \
--enable-multinode-training \
--enable-vpn \
--worker-nodes "a959a49e-185c-4b05-94cb-179768fd6b07" "aec275e9-0ac9-48cd-9364-1019308077fd"
This command deploys a cluster with two worker nodes, configured to enable multi-node training and VPN.
Cluster Deployment Lifecycle
After creating a cluster, it goes through the following states:
- PENDING - The cluster request has been received and is waiting to be processed.
- DEPLOYING - exalsius started the deployment and is currently preparing the cluster nodes.
- READY - The cluster is ready to use.
During the deployment phase, exalsius provisions a Kubernetes stack on the nodes, configures GPU drivers, and installs all necessary dependencies. Depending on the speed and bandwidth of your network, this process can take 10 to 15 minutes.
To check the current status of your clusters, run:
exls clusters list
Command Reference
To explore all available options for cluster creation, run the help command:
exls clusters deploy --help
Listing all Clusters
To list all your clusters, run:
exls clusters list
Check the Available Resources of a Cluster
To see the available resources of your cluster, run:
exls clusters show-available-resources <CLUSTER-ID>
Get the kubeconfig
Clusters are managed Kubernetes clusters.
You can use kubectl to directly interact with your cluster.
To retrieve your kubeconfig, run:
exls clusters import-kubeconfig --kubeconfig-path <file-path-to-store-the-config> <CLUSTER-ID>
Deleting Clusters
To delete a cluster:
exls clusters delete <CLUSTER-ID>
Warning
- Deleting a cluster permanently removes it.
- All workloads, pods, and workspaces running on it will be terminated.
- Make sure to back up any data you need before deletion.
Next Steps
Once your cluster is ready, you can deploy Workspaces (e.g., Jupyter, Marimo, Developer Pods, or Multi-Node Training).