Starting Workspaces

A workspace is a workload or service (e.g. applications like Jupyter, Marimo, or a distributed training run) that can be started on clusters. Unlike a reservation of entire GPU nodes, a workspace reserves specific resources (including GPUs) from the available cluster capacity. For instance, if a node in a cluster has 2 GPUs, two different workspaces that each use 1 GPU can be started on that same node.

exalsius provides several workspace templates that serve as blueprints with pre-configured but customizable settings.

Prerequisites

You have a cluster deployed and ready (see Managing Clusters)
The cluster has available resources (GPUs, CPU, memory) for your workspace.

Deploying Workspaces

exalsius offers a set of prepared workspaces that you can deploy on one of your clusters.

To deploy workspaces, run

exls workspaces deploy <workspace-name>

The argument workspace-name can be one of

jupyter for starting Jupyter notebooks
marimo for starting Marimo notebooks
dev-pod for starting a DevPod that can be used with e.g. VSCode, Cursor or PyCharm.
distributed-training to start a distributed model training

Each workspace requires different parameters to be set via CLI options. Use the --help to see which options need to be set.

exls workspaces deploy dev-pod --help

Workspaces are represented by configuration files that define all required variables for a workspace. exalsius preconfigures the variables with defaults that work for most cases. However, you have the option to view and modify these variables. The CLI workspace deployment process will ask you if you want to modify the configuration file, and will open it with your default CLI editor if you choose to do so.

Final Editing is Optional

Editing the configuration file is optional. If you say no, exalsius defaults will be used. Exiting the editor without saving will result in using the defaults.

Example - Development Pod Workspace Deployment

The dev-pod workspace requires you to either set an ssh password (--ssh-password) or upload a public key (--ssh-public-key).

The number of GPUs for your workspace can be configured via the --num-gpus option. Note that if there is no node with at least the configured amount of GPUs available, the workspace will not be deployed.

If you have more than one cluster deployed, exalsius will ask you which cluster you want to use for your workspace.

exls workspaces deploy dev-pod --num-gpus 1 --ssh-password mysecurepassword

Access the Workspace

Access of the workspace depends on the workspace type. Jupyter and Marimo workspaces can be accessed via a URL that you can open in your browser. There, you will need to enter the password that you provided during the deployment process

DevPod environments can be accessed via SSH. Depending on your configuration, either password-, key-based, or both.

Distributed training workspaces are started across all available GPUs of a cluster. Progress can be tracked via the Weights & Biases.

To list all your workspaces, run

exls workspaces list

To get details (e.g. the access information) about a workspace, run

exls workspaces get <WORKSPACE-ID-or-NAME>

The output will include connection details, i.e. the URL endpoint or SSH (if applicable).

Inbound Port Access is Required

To reach the endpoint via your browser or connect via ssh, your nodes need to allow inbound tcp connections on that port.

The following describes the firewall configuration requirements for different workloads.

Firewall Configuration

Based on your deployment scenario, configure your firewall rules to meet the following requirements. If your nodes are VMs from cloud providers, use the cloud provider's firewall configuration features to open necessary ports and attach these firewalls to the nodes imported into exalsius.

Distributed Multi-Node Model Training

For multi-node training, exalsius uses a secure WireGuard VPN connection to establish encrypted communication between nodes. To enable direct peer-to-peer connections, you must open the WireGuard port on all nodes:

Port	Protocol	Purpose
51871	UDP	WireGuard VPN for direct peer-to-peer connections

Required Port

Opening UDP port 51871 on all nodes is required for multi-node training to enable direct encrypted peer-to-peer connections.

Development Workspace Deployments

When deploying workspaces that expose service endpoints (e.g., Jupyter notebooks, Marimo notebooks, or DevPod environments), specific ports need to be accessible. There are two options:

Option	Ports	Description
Open full NodePort range	30000-32767 (TCP)	Opens all Kubernetes NodePorts. Convenient but less secure.
Open specific ports	Varies	Wait for the workspace deployment to complete, then open only the specific port assigned to your workspace.

Recommended Approach

For better security, we recommend waiting until your workspace is deployed and then opening only the specific port it uses. The assigned port will be displayed in the workspace details after deployment.

List your Workspaces

To list all your workspaces, run

exls workspaces list

Delete a Workspace

To remove a workspace and free up its resources:

exls workspaces delete <WORKSPACE-ID-or-NAME>

Example:

exls workspaces delete 3f8a9b2c-1d4e-5f6a-7b8c-9d0e1f2a3b4c

Warning

Deleting a workspace permanently removes it and terminates all running processes.
Any unsaved work or data in the workspace will be lost.
Make sure to save or export any important data before deletion.

CLI Reference

To explore all available options for workspace management:

exls workspaces --help
exls workspaces deploy --help
exls workspaces list --help
exls workspaces get --help
exls workspaces delete --help

Next Steps

exalsius supports the scaling of experiments on multiple, eventually geo-distributed nodes. To learn about running geo-distributed experiments, check the advanced tutorial on Geo-Distributed Training