Start workspaces
A workspace is a workload or service that runs on a cluster — for example, a Jupyter notebook or an LLM inference server. Unlike reserving entire nodes, a workspace reserves specific resources (including GPUs) from the available cluster capacity. If a node has 2 GPUs, two single-GPU workspaces can run on the same node.
exalsius provides workspace templates — pre-configured blueprints with customizable settings for each workspace type.
Prerequisites
- A cluster in
READYstatus (see deploy clusters) - Available resources (GPUs, CPU, memory) on the cluster
Deploy a workspace
exls workspaces deploy <workspace-type> [CLUSTER-ID-or-NAME] [OPTIONS]
Available workspace types:
| Type | Description |
|---|---|
jupyter |
Jupyter notebook with GPU access |
marimo |
Marimo reactive notebook with GPU access |
dev-pod |
SSH-accessible development environment for VS Code, Cursor, or PyCharm |
llm-inference |
LLM inference server powered by vLLM with tensor parallelism |
All deploy commands share this behavior:
- Cluster selection — pass a cluster ID or name as an argument. If omitted, exalsius auto-selects your only cluster or prompts you to choose from multiple clusters.
- Configuration editing — before deploying, the CLI asks if you want to review and edit the workspace configuration in your default editor. The defaults work for most use cases.
- Confirmation — the CLI displays a deployment summary and asks for confirmation before proceeding.
Jupyter
exls workspaces deploy jupyter [CLUSTER-ID-or-NAME] [OPTIONS]
| Flag | Short | Default | Description |
|---|---|---|---|
--password |
-p |
prompted | Password for the Jupyter notebook (min. 6 characters) |
--num-gpus |
-g |
1 |
Number of GPUs to allocate |
--name |
-n |
auto-generated | Workspace name |
--wait-for-ready |
-w |
false |
Wait until the workspace is ready before returning |
The CLI prompts for the password if --password is not provided. Access the notebook via the URL shown in the Access field.
Marimo
exls workspaces deploy marimo [CLUSTER-ID-or-NAME] [OPTIONS]
| Flag | Short | Default | Description |
|---|---|---|---|
--password |
-p |
prompted | Password for the Marimo notebook (min. 6 characters) |
--num-gpus |
-g |
1 |
Number of GPUs to allocate |
--name |
-n |
auto-generated | Workspace name |
--wait-for-ready |
-w |
false |
Wait until the workspace is ready before returning |
Same as Jupyter — the CLI prompts for a password if not provided. Access via the URL in the Access field.
Dev pod
exls workspaces deploy dev-pod [CLUSTER-ID-or-NAME] [OPTIONS]
| Flag | Short | Default | Description |
|---|---|---|---|
--ssh-password |
-p |
— | Password for SSH access (min. 6 characters) |
--ssh-public-key |
-k |
— | Path to a public key file for SSH access |
--num-gpus |
-g |
1 |
Number of GPUs to allocate |
--name |
-n |
auto-generated | Workspace name |
--wait-for-ready |
-w |
false |
Wait until the workspace is ready before returning |
You must provide at least one of --ssh-password or --ssh-public-key. Both can be used together. Connect via SSH using the address in the Access field.
Compatible with VS Code, Cursor, and PyCharm remote development.
Pod storage
Dev pods use two kinds of storage:
- Ephemeral storage for the running pod. By default, this is 50 GB. If the pod exceeds this limit, Kubernetes may evict or restart it, and data stored there can be lost.
- Persistent storage mounted at
/workspaces. Data stored there survives pod eviction.
LLM inference
exls workspaces deploy llm-inference [CLUSTER-ID-or-NAME] [OPTIONS]
| Flag | Short | Default | Env var | Description |
|---|---|---|---|---|
--huggingface-token |
-t |
prompted | HUGGINGFACE_TOKEN, HF_TOKEN |
HuggingFace API token |
--model-name |
-m |
prompted | — | HuggingFace model in <repo>/<model> format (e.g., Qwen/Qwen3-1.7B) |
--num-gpus |
-g |
1 |
— | Number of GPUs (sets vLLM tensor parallelism) |
--name |
-n |
auto-generated | — | Workspace name |
--wait-for-ready |
-w |
false |
— | Wait until the workspace is ready before returning |
The HuggingFace token can also be provided as --hf-token. Both --huggingface-token and --model-name are prompted if not provided.
LLM inference environment required
The cluster must have the LLM inference environment enabled. Enable it during cluster deployment with --prepare-llm-inference-environment or via the interactive deployment flow.
Access a workspace
How you connect depends on the workspace type:
- Jupyter, Marimo — open the URL from the
Accessfield in your browser and enter the password you set during deployment. - Dev pod — connect via SSH using the address from the
Accessfield. - LLM inference — send requests to the endpoint shown in the
Accessfield.
Manage workspaces
List all workspaces, optionally filtered by cluster:
exls workspaces list [CLUSTER-ID-or-NAME]
Get details and access information for a specific workspace:
exls workspaces get <WORKSPACE-ID-or-NAME>
Delete a workspace:
exls workspaces delete <WORKSPACE-ID-or-NAME>
Warning
- Deleting a workspace permanently removes it and terminates all running processes.
- Any unsaved work or data will be lost.
- Save or export important data before deleting.
Inbound port access
Your nodes must allow inbound TCP connections on the port assigned to your workspace. See firewall configuration below.
Firewall configuration
Configure firewall rules based on your deployment scenario. If your nodes are cloud VMs, use the provider's firewall settings to open the required ports.
exalsius uses WireGuard for encrypted node-to-node communication:
| Port | Protocol | Purpose |
|---|---|---|
| 51871 | UDP | WireGuard VPN for peer-to-peer connections |
Required port
UDP port 51871 must be open on all nodes for cluster communication.
Next steps
- Serving a large language model — step-by-step LLM inference tutorial
- Cluster observability — monitor metrics, logs, and traces