Skip to content

Manage Clusters

exalsius uses the concept of clusters to orchestrate the node compute resources. Each cluster is a managed Kubernetes cluster, consisting of a control plane that is typically managed by the exalsius backend and a set of worker nodes that execute AI workloads. exalsius allows you to deploy clusters composed of nodes from your node pool (see Managing Nodes).


Deploying New Clusters

exalsius offers two primary methods for deploying clusters: an interactive command-line interface (CLI) flow that guides you through the process, and a CLI command option for direct, non-interactive configuration and automation.

Option 1 - Interactive Deployment

For a guided cluster setup, use the interactive deployment flow. It prompts you for all necessary information step-by-step.

To begin, run:

exls clusters deploy

Step 1 - Choose a Cluster Name

You’ll first be asked to enter a cluster name.

Step 2 - Select Available Nodes

These nodes are those previously added to your node pool (see Managing Nodes) You can select one or more worker nodes for your cluster.

Step 3 - Setup Cluster for Multi-Node AI Model Training

You can optionally enable multi-node AI training support, which prepares the cluster for distributed training workloads.

Tip

Enabling multi-node training ensures your cluster is configured with all necessary services for communication and workload scheduling.

Step 4 - Enable VPN between the Cluster Nodes

exalsius automatically sets up a P2P mesh VPN between all cluster nodes using Wireguard. This ensures secure, encrypted communication between nodes in and across different datacenters or networks.

Required Firewall Configuration

All nodes must have UDP port 51871 open to establish peer-to-peer VPN connections. Cluster deployment will fail if nodes cannot communicate through this port. Refer to the Firewall Configuration section for configuration details.


Option 2 - Manual Deployment

For automation scenarios, clusters can be created non-interactively by specifying all required parameters directly as CLI options. This approach is ideal for CI/CD pipelines, scripted deployments, or when you already know the exact configuration.

Example

To create a cluster, run the following command:

exls clusters deploy \
  -- name "my-exalsius-cluster" \
  --enable-multinode-training \
  --enable-vpn \
  --worker-nodes "a959a49e-185c-4b05-94cb-179768fd6b07" "aec275e9-0ac9-48cd-9364-1019308077fd"

This command deploys a cluster with two worker nodes, configured to enable multi-node training and VPN.


Cluster Deployment Lifecycle

After creating a cluster, it goes through the following states:

  1. PENDING - The cluster request has been received and is waiting to be processed.
  2. DEPLOYING - exalsius started the deployment and is currently preparing the cluster nodes.
  3. READY - The cluster is ready to use.

During the deployment phase, exalsius provisions a Kubernetes stack on the nodes, configures GPU drivers, and installs all necessary dependencies. Depending on the speed and bandwidth of your network, this process can take 10 to 15 minutes.

To check the current status of your clusters, run:

exls clusters list


Command Reference

To explore all available options for cluster creation, run the help command:

exls clusters deploy --help

Listing all Clusters

To list all your clusters, run:

exls clusters list

Check the Available Resources of a Cluster

To see the available resources of your cluster, run:

exls clusters show-available-resources <CLUSTER-ID>

Get the kubeconfig

Clusters are managed Kubernetes clusters. You can use kubectl to directly interact with your cluster. To retrieve your kubeconfig, run:

exls clusters import-kubeconfig --kubeconfig-path <file-path-to-store-the-config> <CLUSTER-ID>

Deleting Clusters

To delete a cluster:

exls clusters delete <CLUSTER-ID>

Warning

  • Deleting a cluster permanently removes it.
  • All workloads, pods, and workspaces running on it will be terminated.
  • Make sure to back up any data you need before deletion.

Next Steps

Once your cluster is ready, you can deploy Workspaces (e.g., Jupyter, Marimo, Developer Pods, or Multi-Node Training).