Skip to content

Cluster observability

This tutorial explains how to access and use the built-in observability stack on your exalsius clusters. You'll learn how to view metrics, logs, and traces in Grafana, and how to collect custom telemetry from your own applications.

exalsius automatically deploys observability tooling for every cluster:

  • Metrics — Prometheus-compatible metrics stored in VictoriaMetrics
  • Logs — application and system logs stored in VictoriaLogs
  • Traces — distributed tracing data stored in VictoriaTraces

All telemetry is automatically collected, annotated with cluster metadata, and persisted in a secure, cluster-scoped manner.

Prerequisites

What gets deployed automatically

As part of the initial onboarding process, exalsius will once have deployed a dedicated monitoring cluster for your organization with:

  • VictoriaMetrics — time-series database for metrics
  • VictoriaLogs — log storage and indexing
  • VictoriaTraces — distributed tracing backend
  • Grafana — visualization and querying platform

Each new cluster created by members of your organization will then have its telemetry data ingested to this monitoring cluster. This means that by the time a new cluster reaches READY status, exalsius has already provisioned:

  • OpenTelemetry collectors — deployed as DaemonSets and Deployments to automatically discover and scrape metrics, logs, and traces from your cluster

No manual setup is needed for standard Kubernetes and system-level telemetry.

Access the Grafana dashboards

Retrieve the Grafana URL for your organization within exalsius:

exls management get-dashboard-url

Add --open to open it directly in your browser:

exls management get-dashboard-url --open

Log in with your exalsius credentials. Telemetry from all clusters in your organization is accessible from this platform.

Pre-configured dashboards

Once logged in, you'll find several ready-to-use dashboards:

  • Kubernetes overview — cluster-level metrics including node status, pod counts, and resource usage
  • Node exporter — detailed system metrics per node (CPU, memory, disk, network)
  • Kubelet metrics — container and pod-level resource consumption
  • Application logs — searchable logs from all pods
  • Distributed traces — request traces across services via the trace explorer

You can explore these dashboards, create custom queries using PromQL (metrics), LogsQL (logs), or trace queries (traces), and build your own dashboards.

Query telemetry via REST API

Grafana is great for interactive exploration, but you may want programmatic access for automation, integrations, or ad-hoc scripts. exalsius exposes REST endpoints for that.

Retrieve API credentials

REST API access is protected by basic authentication. The required username/password is stored in the kof-vmuser-creds-<hash> secret in the kof namespace of your cluster.

# First, get the kubeconfig of the cluster
exls clusters import-kubeconfig <CLUSTER-ID-or-NAME> --kubeconfig-path kube.conf

# Set the environment variable
export KUBECONFIG=kube.conf

# Retrieve username and password
SECRET_NAME=$(kubectl get secret -n kof | grep kof-vmuser-creds | cut -d ' ' -f 1)
USERNAME=$(kubectl get secret $SECRET_NAME -n kof -o jsonpath='{.data.username}' | base64 -d)
PASSWORD=$(kubectl get secret $SECRET_NAME -n kof -o jsonpath='{.data.password}' | base64 -d)

echo "username: $USERNAME"
echo "password: $PASSWORD"

Prepare a Basic Auth header

BASIC_AUTH=$(echo -n "$USERNAME:$PASSWORD" | base64)
echo "basic-auth: $BASIC_AUTH"

Use this value as Authorization: Basic ... in your requests.

Info

We will show some example queries in the following. They must be send to the domain associated with your organization within exalsius, i.e. https://vmauth.<org-name>.ex.ls.

# In this example, let us assume the organization name is "mycompany"
ORG_NAME=mycompany
# i.e. this will resolve to https://vmauth.mycompany.ex.ls

Query metrics (PromQL)

The metrics endpoint can be used to retrieve time-series measurements such as CPU/memory utilization, pod and node health, request rates/latencies, and any custom Prometheus metrics you expose.

curl \
    -H "Authorization: Basic $BASIC_AUTH" \
    -H "Content-Type: application/x-www-form-urlencoded" \
    "https://vmauth.$ORG_NAME.ex.ls/vm/select/0/prometheus/api/v1/query" \
    -d 'query=up'

Query logs (LogsQL)

The logs endpoint can be used to retrieve application and system logs across your cluster, filter by text or structured fields, and narrow down results by time range and limits.

curl \
    -H "Authorization: Basic $BASIC_AUTH" \
    -H "Content-Type: application/x-www-form-urlencoded" \
    "https://vmauth.$ORG_NAME.ex.ls/vls/select/logsql/query" \
    -d 'query=error | limit 10'

Query traces (Jaeger HTTP API)

The traces endpoint can be used to retrieve distributed traces for instrumented services, search by service/operation/tags, and fetch full traces by trace ID for end-to-end request debugging.

List traces for a given service:

curl \
    -H "Authorization: Basic $BASIC_AUTH" \
    "https://vmauth.$ORG_NAME.ex.ls/vt/select/0/jaeger/api/traces?service=my-service&limit=10"

Filter traces by operation, tags and duration:

curl \
    -H "Authorization: Basic $BASIC_AUTH" \
    "https://vmauth.$ORG_NAME.ex.ls/vt/select/0/jaeger/api/traces?service=my-service&operation=my-operation&tags=%7B%22error%22%3A%22true%22%7D&minDuration=1ms&maxDuration=10ms&limit=20"

The tags parameter is JSON (URL-encoded in the example above). You can filter by span attributes, resource attributes (prefix resource_attr:), or instrumentation scope attributes (prefix scope_attr:).

Fetch a trace by ID:

curl \
    -H "Authorization: Basic $BASIC_AUTH" \
    "https://vmauth.$ORG_NAME.ex.ls/vt/select/0/jaeger/api/traces/<TRACE-ID>"

Further query documentation

Collect custom metrics

The OpenTelemetry collectors automatically scrape standard Kubernetes metrics. To monitor your own applications, expose a Prometheus-format /metrics endpoint and create a PodMonitor or ServiceMonitor resource.

PodMonitor

Scrapes metrics directly from pods matched by label:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: my-custom-exporter
  namespace: default
spec:
  selector:
    matchLabels:
      app: my-exporter
  podMetricsEndpoints:
  - port: metrics
    path: /metrics
    interval: 30s

ServiceMonitor

Scrapes metrics through a Kubernetes service:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-service-monitor
  namespace: default
spec:
  selector:
    matchLabels:
      app: my-service
  endpoints:
  - port: metrics
    path: /metrics
    interval: 30s

Note

These are Kubernetes manifests. Export your cluster's kubeconfig first (exls clusters import-kubeconfig <CLUSTER-ID-or-NAME>), then apply with kubectl apply -f <manifest-file>.

Best practices

  • Use named ports in your pod/service definitions — monitors reference ports by name, not number.
  • Follow the Prometheus exposition format.
  • Set scraping intervals between 30s and 1m for a good balance of freshness and resource usage.
  • Create monitors in the same namespace as your workloads.

Collect custom traces

To emit distributed traces from your applications, instrument them with an OpenTelemetry SDK. Configure the SDK to export traces via OTLP to the collector service running in your cluster (typically otel-collector in the kof namespace).

Once instrumented, traces appear automatically in Grafana's trace explorer.

Architecture

flowchart TB
    subgraph YourCluster["Your cluster"]
        Apps["Applications"]
        System["System components"]
        Collectors["OpenTelemetry collectors"]
    end

    subgraph MonitoringCluster["Monitoring cluster"]
        VM["VictoriaMetrics"]
        VL["VictoriaLogs"]
        VT["VictoriaTraces"]
        Grafana["Grafana"]
    end

    Users["Your queries"]

    Apps -->|"metrics, logs, traces"| Collectors
    System -->|"metrics, logs"| Collectors
    Collectors -->|"authenticated"| VM
    Collectors -->|"authenticated"| VL
    Collectors -->|"authenticated"| VT
    Users -->|"queries"| Grafana
    Grafana -->|"read"| VM
    Grafana -->|"read"| VL
    Grafana -->|"read"| VT

    style YourCluster stroke-width:2px
    style MonitoringCluster stroke-width:2px

Key properties:

  • Centralized storage — all telemetry is stored in a dedicated monitoring cluster
  • Automatic discovery — no manual setup for standard Kubernetes metrics
  • Cluster isolation — data is tagged and filtered per cluster
  • Standards-based — uses OpenTelemetry and Prometheus formats

Further reading