Cluster observability
This tutorial explains how to access and use the built-in observability stack on your exalsius clusters. You'll learn how to view metrics, logs, and traces in Grafana, and how to collect custom telemetry from your own applications.
exalsius automatically deploys observability tooling for every cluster:
- Metrics — Prometheus-compatible metrics stored in VictoriaMetrics
- Logs — application and system logs stored in VictoriaLogs
- Traces — distributed tracing data stored in VictoriaTraces
All telemetry is automatically collected, annotated with cluster metadata, and persisted in a secure, cluster-scoped manner.
Prerequisites
- The exalsius CLI installed and configured
- A cluster in
READYstatus (see deploy clusters)
What gets deployed automatically
As part of the initial onboarding process, exalsius will once have deployed a dedicated monitoring cluster for your organization with:
- VictoriaMetrics — time-series database for metrics
- VictoriaLogs — log storage and indexing
- VictoriaTraces — distributed tracing backend
- Grafana — visualization and querying platform
Each new cluster created by members of your organization will then have its telemetry data ingested to this monitoring cluster.
This means that by the time a new cluster reaches READY status, exalsius has already provisioned:
- OpenTelemetry collectors — deployed as DaemonSets and Deployments to automatically discover and scrape metrics, logs, and traces from your cluster
No manual setup is needed for standard Kubernetes and system-level telemetry.
Access the Grafana dashboards
Retrieve the Grafana URL for your organization within exalsius:
exls management get-dashboard-url
Add --open to open it directly in your browser:
exls management get-dashboard-url --open
Log in with your exalsius credentials. Telemetry from all clusters in your organization is accessible from this platform.
Pre-configured dashboards
Once logged in, you'll find several ready-to-use dashboards:
- Kubernetes overview — cluster-level metrics including node status, pod counts, and resource usage
- Node exporter — detailed system metrics per node (CPU, memory, disk, network)
- Kubelet metrics — container and pod-level resource consumption
- Application logs — searchable logs from all pods
- Distributed traces — request traces across services via the trace explorer
You can explore these dashboards, create custom queries using PromQL (metrics), LogsQL (logs), or trace queries (traces), and build your own dashboards.
Query telemetry via REST API
Grafana is great for interactive exploration, but you may want programmatic access for automation, integrations, or ad-hoc scripts. exalsius exposes REST endpoints for that.
Retrieve API credentials
REST API access is protected by basic authentication. The required username/password is stored in the kof-vmuser-creds-<hash> secret in the kof namespace of your cluster.
# First, get the kubeconfig of the cluster
exls clusters import-kubeconfig <CLUSTER-ID-or-NAME> --kubeconfig-path kube.conf
# Set the environment variable
export KUBECONFIG=kube.conf
# Retrieve username and password
SECRET_NAME=$(kubectl get secret -n kof | grep kof-vmuser-creds | cut -d ' ' -f 1)
USERNAME=$(kubectl get secret $SECRET_NAME -n kof -o jsonpath='{.data.username}' | base64 -d)
PASSWORD=$(kubectl get secret $SECRET_NAME -n kof -o jsonpath='{.data.password}' | base64 -d)
echo "username: $USERNAME"
echo "password: $PASSWORD"
Prepare a Basic Auth header
BASIC_AUTH=$(echo -n "$USERNAME:$PASSWORD" | base64)
echo "basic-auth: $BASIC_AUTH"
Use this value as Authorization: Basic ... in your requests.
Info
We will show some example queries in the following. They must be send to the domain associated with your organization within exalsius, i.e. https://vmauth.<org-name>.ex.ls.
# In this example, let us assume the organization name is "mycompany"
ORG_NAME=mycompany
# i.e. this will resolve to https://vmauth.mycompany.ex.ls
Query metrics (PromQL)
The metrics endpoint can be used to retrieve time-series measurements such as CPU/memory utilization, pod and node health, request rates/latencies, and any custom Prometheus metrics you expose.
curl \
-H "Authorization: Basic $BASIC_AUTH" \
-H "Content-Type: application/x-www-form-urlencoded" \
"https://vmauth.$ORG_NAME.ex.ls/vm/select/0/prometheus/api/v1/query" \
-d 'query=up'
Query logs (LogsQL)
The logs endpoint can be used to retrieve application and system logs across your cluster, filter by text or structured fields, and narrow down results by time range and limits.
curl \
-H "Authorization: Basic $BASIC_AUTH" \
-H "Content-Type: application/x-www-form-urlencoded" \
"https://vmauth.$ORG_NAME.ex.ls/vls/select/logsql/query" \
-d 'query=error | limit 10'
Query traces (Jaeger HTTP API)
The traces endpoint can be used to retrieve distributed traces for instrumented services, search by service/operation/tags, and fetch full traces by trace ID for end-to-end request debugging.
List traces for a given service:
curl \
-H "Authorization: Basic $BASIC_AUTH" \
"https://vmauth.$ORG_NAME.ex.ls/vt/select/0/jaeger/api/traces?service=my-service&limit=10"
Filter traces by operation, tags and duration:
curl \
-H "Authorization: Basic $BASIC_AUTH" \
"https://vmauth.$ORG_NAME.ex.ls/vt/select/0/jaeger/api/traces?service=my-service&operation=my-operation&tags=%7B%22error%22%3A%22true%22%7D&minDuration=1ms&maxDuration=10ms&limit=20"
The tags parameter is JSON (URL-encoded in the example above). You can filter by span attributes, resource attributes (prefix resource_attr:), or instrumentation scope attributes (prefix scope_attr:).
Fetch a trace by ID:
curl \
-H "Authorization: Basic $BASIC_AUTH" \
"https://vmauth.$ORG_NAME.ex.ls/vt/select/0/jaeger/api/traces/<TRACE-ID>"
Further query documentation
- PromQL — query language for metrics
- VictoriaMetrics API — HTTP API examples for metrics
- VictoriaLogs LogsQL — log queries
- VictoriaTraces querying — Jaeger HTTP API and LogsQL for traces
Collect custom metrics
The OpenTelemetry collectors automatically scrape standard Kubernetes metrics. To monitor your own applications, expose a Prometheus-format /metrics endpoint and create a PodMonitor or ServiceMonitor resource.
PodMonitor
Scrapes metrics directly from pods matched by label:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: my-custom-exporter
namespace: default
spec:
selector:
matchLabels:
app: my-exporter
podMetricsEndpoints:
- port: metrics
path: /metrics
interval: 30s
ServiceMonitor
Scrapes metrics through a Kubernetes service:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-service-monitor
namespace: default
spec:
selector:
matchLabels:
app: my-service
endpoints:
- port: metrics
path: /metrics
interval: 30s
Note
These are Kubernetes manifests. Export your cluster's kubeconfig first (exls clusters import-kubeconfig <CLUSTER-ID-or-NAME>), then apply with kubectl apply -f <manifest-file>.
Best practices
- Use named ports in your pod/service definitions — monitors reference ports by name, not number.
- Follow the Prometheus exposition format.
- Set scraping intervals between 30s and 1m for a good balance of freshness and resource usage.
- Create monitors in the same namespace as your workloads.
Collect custom traces
To emit distributed traces from your applications, instrument them with an OpenTelemetry SDK. Configure the SDK to export traces via OTLP to the collector service running in your cluster (typically otel-collector in the kof namespace).
Once instrumented, traces appear automatically in Grafana's trace explorer.
Architecture
flowchart TB
subgraph YourCluster["Your cluster"]
Apps["Applications"]
System["System components"]
Collectors["OpenTelemetry collectors"]
end
subgraph MonitoringCluster["Monitoring cluster"]
VM["VictoriaMetrics"]
VL["VictoriaLogs"]
VT["VictoriaTraces"]
Grafana["Grafana"]
end
Users["Your queries"]
Apps -->|"metrics, logs, traces"| Collectors
System -->|"metrics, logs"| Collectors
Collectors -->|"authenticated"| VM
Collectors -->|"authenticated"| VL
Collectors -->|"authenticated"| VT
Users -->|"queries"| Grafana
Grafana -->|"read"| VM
Grafana -->|"read"| VL
Grafana -->|"read"| VT
style YourCluster stroke-width:2px
style MonitoringCluster stroke-width:2px
Key properties:
- Centralized storage — all telemetry is stored in a dedicated monitoring cluster
- Automatic discovery — no manual setup for standard Kubernetes metrics
- Cluster isolation — data is tagged and filtered per cluster
- Standards-based — uses OpenTelemetry and Prometheus formats