Starting Workspaces
A workspace is a workload or service (e.g. applications like Jupyter, Marimo, or a distributed training run) that can be started on clusters. Unlike a reservation of entire GPU nodes, a workspace reserves specific resources (including GPUs) from the available cluster capacity. For instance, if a node in a cluster has 2 GPUs, two different workspaces that each use 1 GPU can be started on that same node.
exalsius provides several workspace templates that serve as blueprints with pre-configured but customizable settings.
Prerequisites
- You have a cluster deployed and ready (see Managing Clusters)
- The cluster has available resources (GPUs, CPU, memory) for your workspace.
Deploying Workspaces
exalsius offers a set of prepared workspaces that you can deploy on one of your clusters.
To deploy workspaces, run
exls workspaces deploy <workspace-name>
The argument workspace-name can be one of
jupyterfor starting Jupyter notebooksmarimofor starting Marimo notebooksdev-podfor starting a DevPod that can be used with e.g. VSCode, Cursor or PyCharm.distributed-trainingto start a distributed model training
Each workspace requires different parameters to be set via CLI options.
Use the --help to see which options need to be set.
exls workspaces deploy dev-pod --help
Workspaces are represented by configuration files that define all required variables for a workspace. exalsius preconfigures the variables with defaults that work for most cases. However, you have the option to view and modify these variables. The CLI workspace deployment process will ask you if you want to modify the configuration file, and will open it with your default CLI editor if you choose to do so.
Final Editing is Optional
Editing the configuration file is optional. If you say no, exalsius defaults will be used. Exiting the editor without saving will result in using the defaults.
Example - Development Pod Workspace Deployment
The dev-pod workspace requires you to either set an ssh password (--ssh-password) or upload a public key (--ssh-public-key).
The number of GPUs for your workspace can be configured via the --num-gpus option.
Note that if there is no node with at least the configured amount of GPUs available, the workspace will not be deployed.
If you have more than one cluster deployed, exalsius will ask you which cluster you want to use for your workspace.
exls workspaces deploy dev-pod --num-gpus 1 --ssh-password mysecurepassword
Access the Workspace
Access of the workspace depends on the workspace type. Jupyter and Marimo workspaces can be accessed via a URL that you can open in your browser. There, you will need to enter the password that you provided during the deployment process
DevPod environments can be accessed via SSH. Depending on your configuration, either password-, key-based, or both.
Distributed training workspaces are started across all available GPUs of a cluster. Progress can be tracked via the Weights & Biases.
To list all your workspaces, run
exls workspaces list
To get details (e.g. the access information) about a workspace, run
exls workspaces get <WORKSPACE-ID>
The output will include connection details, i.e. the URL endpoint or ssh (if applicable).
Inbound Port Access is Required
To reach the endpoint via your browser or connect via ssh, your nodes need to allow inbound tcp connections on that port.
The following describes the firewall configuration requirements for different workloads.
Firewall Configuration
Based on your deployment scenario, configure your firewall rules to meet the following requirements. If your nodes are VMs from cloud providers, use the cloud provider's firewall configuration features to open necessary ports and attach these firewalls to the nodes imported into exalsius.
Distributed Multi-Node Model Training
For multi-node training, exalsius uses a secure WireGuard VPN connection to establish encrypted communication between nodes. To enable direct peer-to-peer connections, you must open the WireGuard port on all nodes:
| Port | Protocol | Purpose |
|---|---|---|
| 51871 | UDP | WireGuard VPN for direct peer-to-peer connections |
Required Port
Opening UDP port 51871 on all nodes is required for multi-node training to enable direct encrypted peer-to-peer connections.
Development Workspace Deployments
When deploying workspaces that expose service endpoints (e.g., Jupyter notebooks, Marimo notebooks, or DevPod environments), specific ports need to be accessible. There are two options:
| Option | Ports | Description |
|---|---|---|
| Open full NodePort range | 30000-32767 (TCP) | Opens all Kubernetes NodePorts. Convenient but less secure. |
| Open specific ports | Varies | Wait for the workspace deployment to complete, then open only the specific port assigned to your workspace. |
Recommended Approach
For better security, we recommend waiting until your workspace is deployed and then opening only the specific port it uses. The assigned port will be displayed in the workspace details after deployment.
List your Workspaces
To list all your workspaces, run
exls workspaces list
Delete a Workspace
To remove a workspace and free up its resources:
exls workspaces delete <WORKSPACE-ID>
Example:
exls workspaces delete 3f8a9b2c-1d4e-5f6a-7b8c-9d0e1f2a3b4c
Warning
- Deleting a workspace permanently removes it and terminates all running processes.
- Any unsaved work or data in the workspace will be lost.
- Make sure to save or export any important data before deletion.
CLI Reference
To explore all available options for workspace management:
exls workspaces --help
exls workspaces deploy --help
exls workspaces list --help
exls workspaces get --help
exls workspaces delete --help
Next Steps
exalsius supports the scaling of experiments on multiple, eventually geo-distributed nodes. To learn about running geo-distributed experiments, check the advanced tutorial on Geo-Distributed Training