Configuring and Leveraging Cluster Observability
This tutorial explains how to configure, access, and query telemetry data (metrics, logs, and traces) from your exalsius cluster. You'll learn how to use the built-in Grafana dashboards, query data programmatically via REST APIs, and extend observability to collect custom metrics and traces from your applications.
When observability is enabled on your cluster, exalsius automatically configures secure access to:
- Metrics: Prometheus-compatible metrics stored in VictoriaMetrics
- Logs: Application and system logs stored in VictoriaLogs
- Traces: Distributed tracing data stored in VictoriaTraces
All telemetry data is automatically collected, annotated with cluster-specific metadata, and persisted in a secure, cluster-scoped manner.
Before You Begin
Before continuing, ensure you have:
- Installed and configured the exalsius CLI
- Logged in with your exalsius account
- At least one active cluster (Working with Clusters)
- A cluster with observability enabled (see Step 1 below)
Note
You can verify your setup using:
exls --version
exls clusters list
Step 1 — Enable Observability on Your Cluster
Observability must be enabled when creating a cluster. If you haven't created a cluster yet, or if you need to create a new cluster with observability, use the --enable-telemetry flag during cluster creation:
exls clusters deploy --name <CLUSTER-NAME> --enable-telemetry
Tip
If you're unsure whether observability is enabled on an existing cluster, you can check for Grafana dashboard access (see Step 2 below).
Once observability is enabled, exalsius automatically deploys and configures the observability stack, which includes:
- OpenTelemetry Collectors: Automatically discover and scrape metrics, logs, and traces from your cluster
- VictoriaMetrics: Stores time-series metrics data
- VictoriaLogs: Stores log data
- VictoriaTraces: Stores distributed tracing data
- Grafana: Provides visualization and querying capabilities
The setup process typically takes a few minutes. Once complete, you can start accessing your telemetry data.
Step 2 — Access the Grafana Dashboard
The easiest way to explore your cluster's telemetry data is through the Grafana web interface. Grafana provides pre-configured dashboards for common Kubernetes metrics, system performance, and application logs.
Getting Your Dashboard URL
To obtain a cluster-scoped login link to Grafana:
exls clusters get-dashboard-url <CLUSTER-ID>
This command returns a unique URL that provides:
- Passwordless authentication: No credentials needed—the link includes authentication tokens
- Cluster-scoped access: You can only view data from the specified cluster
- Read-only mode: All data is read-only for security
Using Grafana Dashboards
Once you open the dashboard URL, you'll have access to several pre-configured dashboards:
- Kubernetes Overview: Cluster-level metrics including node status, pod counts, and resource usage
- Node Exporter: Detailed system metrics from each node (CPU, memory, disk, network)
- Kubelet Metrics: Container and pod-level resource consumption
- Application Logs: Searchable logs from all pods in your cluster
- Distributed Traces: View and analyze request traces across services using the integrated trace explorer
You can explore these dashboards, create custom queries using PromQL (for metrics), LogsQL (for logs), or trace queries (for traces), and even create your own custom dashboards. All queries are automatically scoped to your cluster's data.
Tip
The URL remains valid as long as your cluster exists, but you'll need to regenerate it if your authentication expires.
Step 3 — Query Data via REST API
While Grafana is excellent for interactive exploration, you may need programmatic access to telemetry data for automation, integration with external systems, or custom analysis. The observability stack exposes REST APIs compatible with Prometheus (for metrics), VictoriaLogs (for logs), and VictoriaTraces (for traces).
Retrieving Authentication Credentials
To authenticate against the REST APIs, you'll need credentials stored in a Kubernetes secret. Let's retrieve them:
# First, get the kubeconfig for your cluster
exls clusters import-kubeconfig <CLUSTER-ID> --kubeconfig-path kube.conf
# Set the KUBECONFIG environment variable
export KUBECONFIG=kube.conf
# Retrieve the username from the secret
USERNAME=$(kubectl get secret storage-vmuser-credentials -n kof -o jsonpath='{.data.username}' | base64 -d)
# Retrieve the password from the secret
PASSWORD=$(kubectl get secret storage-vmuser-credentials -n kof -o jsonpath='{.data.password}' | base64 -d)
# Verify the credentials were retrieved
echo "Username: $USERNAME"
echo "Password: $PASSWORD"
The credentials are stored in the storage-vmuser-credentials secret in the kof namespace. These credentials are automatically created by exalsius and are unique to your cluster.
Preparing Basic Authentication
Next, we'll encode these credentials for HTTP Basic Authentication:
# Create the Basic Auth header value
BASIC_AUTH=$(echo -n "$USERNAME:$PASSWORD" | base64)
echo "Basic Auth Header: Basic $BASIC_AUTH"
This encoded string will be used in the Authorization header of your API requests.
Querying Metrics
Metrics are stored in VictoriaMetrics and can be queried using PromQL (Prometheus Query Language). Here's an example query to check if all targets are up:
curl \
-H "Authorization: Basic $BASIC_AUTH" \
-H "Content-Type: application/x-www-form-urlencoded" \
"https://vmauth-de1.exalsius.ai/vm/select/0/prometheus/api/v1/query" \
-d 'query=up'
This query returns a JSON response with the current value of the up metric, which indicates whether monitoring targets are reachable.
Querying Logs
Logs are stored in VictoriaLogs and can be queried using LogsQL. Here's an example to search for error logs:
curl \
-H "Authorization: Basic $BASIC_AUTH" \
-H "Content-Type: application/x-www-form-urlencoded" \
"https://vmauth-de1.exalsius.ai/vls/select/logsql/query" \
-d 'query=error | limit 10'
This query searches for log entries containing "error" and limits the results to 10 entries.
Querying Traces
Traces are stored in VictoriaTraces and can be queried via the Jaeger HTTP API. Here's an example to search for traces from a specific service:
curl \
-H "Authorization: Basic $BASIC_AUTH" \
"https://vmauth-de1.exalsius.ai/vt/select/0/jaeger/api/traces?service=my-service&limit=10"
You can filter traces by multiple criteria. Here's an example querying traces with errors, filtered by operation and duration:
curl \
-H "Authorization: Basic $BASIC_AUTH" \
"https://vmauth-de1.exalsius.ai/vt/select/0/jaeger/api/traces?service=my-service&operation=my-operation&tags=%7B%22error%22%3A%22true%22%7D&minDuration=1ms&maxDuration=10ms&limit=20"
The tags parameter uses JSON format (URL-encoded in the example above). You can filter by span attributes, resource attributes (with resource_attr: prefix), or instrumentation scope attributes (with scope_attr: prefix).
You can also retrieve a specific trace by its trace ID:
curl \
-H "Authorization: Basic $BASIC_AUTH" \
"https://vmauth-de1.exalsius.ai/vt/select/0/jaeger/api/traces/<TRACE-ID>"
Note
The most convenient way to explore traces is through the Grafana trace explorer, which provides an interactive interface for trace visualization and analysis. The REST API is useful for programmatic access and integration with external tools. VictoriaTraces also supports LogsQL queries for advanced trace filtering.
Advanced Querying
For more complex queries, refer to the official documentation:
- PromQL: Query language for metrics (CPU usage, memory, custom metrics, etc.)
- VictoriaMetrics API: Additional query endpoints and features
- VictoriaLogs LogsQL: Query language for log data
- VictoriaTraces API: Jaeger HTTP API and LogsQL for querying distributed traces
Note
The API endpoints (vmauth-de1.exalsius.ai) may vary depending on your cluster's region. Check your cluster configuration or contact support if you encounter connection issues.
Step 4 — Collect Custom Metrics
While exalsius automatically collects standard Kubernetes and system metrics, you may need to monitor custom application metrics, third-party services, or specialized workloads. The observability stack supports collecting metrics from custom exporters and application endpoints.
Understanding Metric Collection
The OpenTelemetry collectors deployed by exalsius automatically discover and scrape metrics from:
- Kubernetes system components (kubelet, kube-proxy, etc.)
- Node exporters (CPU, memory, disk, network metrics)
- Pods and Services configured via PodMonitor and ServiceMonitor custom resources
To add your own metrics, you need to:
- Expose metrics in Prometheus format (typically at a
/metricsendpoint) - Create a PodMonitor or ServiceMonitor resource to configure scraping
Using PodMonitor and ServiceMonitor
PodMonitor and ServiceMonitor custom resources provide fine-grained control over metric collection, allowing you to configure:
- Custom scraping intervals
- Relabeling rules for metric transformation
- TLS/authentication configuration
- Complex label selectors
- Multiple endpoints per resource
Using PodMonitor
PodMonitor is used to scrape metrics directly from Pods based on label selectors. This is ideal when you want to monitor multiple pods with a single configuration:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: my-custom-exporter
namespace: default
spec:
selector:
matchLabels:
app: my-exporter
podMetricsEndpoints:
- port: metrics # Must match a named port in the Pod
path: /metrics # Metrics endpoint path
interval: 30s # Scraping interval
scheme: http # http or https
Key points:
- The
selector.matchLabelsmust match labels on your Pods - The
portmust be a named port in your Pod specification (not a number) - The
namespaceshould match where your Pods are deployed
Using ServiceMonitor
ServiceMonitor is used to scrape metrics from Services, which is particularly useful for monitoring services that may have multiple backend pods or when you want to scrape through a Service rather than individual Pods:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-service-monitor
namespace: default
spec:
selector:
matchLabels:
app: my-service
endpoints:
- port: metrics # Must match a named port in the Service
path: /metrics
interval: 30s
scheme: https # Can use https
tlsConfig:
insecureSkipVerify: false # TLS configuration
Key points:
- The
selector.matchLabelsmust match labels on your Service - The
portmust be a named port in your Service specification - You can configure TLS settings for secure endpoints
- Multiple endpoints can be defined for different ports or paths
Best Practices
When collecting custom metrics:
- Use named ports: Always use named ports in your Pod/Service definitions when using PodMonitor/ServiceMonitor
- Follow Prometheus format: Ensure your metrics endpoint follows the Prometheus exposition format
- Use appropriate intervals: Balance between data freshness and resource usage (30s-1m is typical)
- Label your metrics: Use meaningful labels in your metrics for better querying and filtering
- Namespace considerations: Create monitor resources in the same namespace as your workloads, or ensure proper RBAC permissions
Once your custom metrics are being collected, they'll appear in Grafana alongside standard metrics and can be queried via the REST API using PromQL.
Step 5 — Collect Custom Traces
While exalsius automatically collects standard Kubernetes and system telemetry, you may need to instrument your applications to emit distributed traces. Distributed tracing helps you understand request flows across services, identify performance bottlenecks, and debug issues in microservices architectures.
To collect traces from your applications, you need to configure OpenTelemetry SDKs in your application code. The OpenTelemetry collectors deployed by exalsius automatically receive traces from instrumented applications via OTLP (OpenTelemetry Protocol). Configure your OpenTelemetry SDK to export traces to the collector's OTLP endpoint, which is typically accessible at the otel-collector service in the observability or kof namespace.
For detailed instructions on instrumenting your application, refer to the OpenTelemetry documentation for your specific programming language. Once your application is instrumented and sending traces, they'll appear in Grafana's trace explorer and can be queried via the REST API using TraceQL.
Understanding the Architecture
To help you make the most of observability, here's what happens under the hood when observability is enabled on your cluster.
Component Overview
exalsius deploys a comprehensive observability stack based primarily on OpenTelemetry, an industry-standard observability framework. The stack includes:
-
OpenTelemetry Collectors: Deployed as DaemonSets and Deployments, these collectors:
- Automatically discover Kubernetes resources (Pods, Services, etc.)
- Scrape metrics from annotated resources and system components
- Collect logs from pods and system components
- Receive distributed traces from instrumented applications via OTLP
- Annotate all telemetry data with cluster-specific metadata (cluster ID, cluster name, region, etc.)
-
Telemetry Storage: Collected data is securely transmitted to a dedicated monitoring cluster that runs:
- VictoriaMetrics: High-performance time-series database for metrics
- VictoriaLogs: Efficient log storage and indexing system
- VictoriaTraces: Distributed tracing backend for storing and querying trace data
- Grafana: Visualization and querying platform with integrated trace explorer
-
Security and Access Control:
- Outbound authentication: Credentials are automatically created and stored in your cluster's
kofnamespace, allowing collectors to authenticate when sending data - Inbound authentication: The monitoring cluster only accepts data from authorized clusters
- Query-time filtering: When you query data (via Grafana or API), results are automatically filtered to show only data from your cluster
- Read-only access: All user-facing access is read-only for security
- Outbound authentication: Credentials are automatically created and stored in your cluster's
Data Flow
flowchart TB
subgraph YourCluster["Your Cluster"]
Apps["Applications<br/>(Pods)"]
System["System Components<br/>(kubelet, etc.)"]
Collectors["OpenTelemetry<br/>Collectors"]
end
subgraph MonitoringCluster["Monitoring Cluster"]
VM["VictoriaMetrics<br/>(Metrics Storage)"]
VL["VictoriaLogs<br/>(Log Storage)"]
VT["VictoriaTraces<br/>(Trace Storage)"]
Grafana["Grafana<br/>(Visualization)"]
end
Users["Your Queries<br/>(Grafana / REST API)"]
Apps -->|"metrics, logs, traces"| Collectors
System -->|"metrics, logs"| Collectors
Collectors -->|"authenticated<br/>transmission"| VM
Collectors -->|"authenticated<br/>transmission"| VL
Collectors -->|"authenticated<br/>transmission"| VT
Users -->|"queries"| Grafana
Grafana -->|"read"| VM
Grafana -->|"read"| VL
Grafana -->|"read"| VT
style YourCluster fill:#e1f5ff,stroke:#01579b,stroke-width:2px
style MonitoringCluster fill:#fff3e0,stroke:#e65100,stroke-width:2px
style Collectors fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
style Grafana fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px
Key Benefits
This architecture provides several advantages:
- Centralized storage: All telemetry data is stored in a dedicated, optimized monitoring cluster
- Automatic discovery: No manual configuration needed for standard Kubernetes metrics
- Cluster isolation: Data is automatically tagged and filtered by cluster
- Scalability: The monitoring infrastructure scales independently from your workloads
- Security: Multi-layer authentication ensures only authorized access
- Standards-based: Uses OpenTelemetry and Prometheus standards for compatibility
Understanding this architecture helps you make informed decisions about what to monitor, how to structure your custom metrics, and how to optimize your observability setup.
Next Steps
Now that you understand how to use observability on your cluster, you can:
- Explore the pre-configured Grafana dashboards to understand your cluster's behavior
- Create custom dashboards for metrics specific to your workloads
- Integrate REST API queries into your automation and monitoring scripts
- Add custom metrics from your applications using annotations or PodMonitor/ServiceMonitor
- Instrument your applications to emit distributed traces and analyze them in Grafana
- Set up alerts in Grafana based on your metrics, logs, and traces
For more information, refer to the Prometheus documentation, VictoriaMetrics documentation, VictoriaTraces documentation, OpenTelemetry documentation, and Grafana documentation.