New to Stash? Please start here.

Monitoring Stash

Stash has native support for monitoring via Prometheus. You can use builtin Prometheus scraper or CoreOS Prometheus Operator to monitor Stash. This tutorial will show you how this monitoring works with Stash and how to enable them.

Overview

Stash uses Prometheus PushGateway to export the metrics for backup & restore operations. Following diagram shows the logical structure of Stash monitoring flow.

  Stash Monitoring Flow
Fig: Monitoring process in Stash

Stash operator runs two containers. The operator container runs controller and other necessary stuffs and the pushgateway container runs prom/pushgateway image. Stash sidecar from different workloads pushes its metrics to this pushgateway. Then Prometheus server scrapes these metrics through stash-operator service. Stash operator itself also provides some metrics at /metrics path of :8443 port.

Backup Metrics

Following metrics available for backup process:

MetricUses
stash_backup_setup_successIndicates whether backup was successfully setup for the target
stash_backup_session_successIndicates whether the current backup session succeeded or not
stash_backup_session_duration_total_secondsTotal time taken to complete the backup session
stash_backup_data_size_bytesTotal size of the target data to backup (in bytes)
stash_backup_data_uploaded_bytesAmount of data uploaded to the repository in this session (in bytes)
stash_backup_data_processing_time_secondsTotal time taken to backup the target data
stash_backup_files_totalTotal number of files that has been backed up
stash_backup_files_newTotal number of new files that has been created since last backup
stash_backup_files_modifiedTotal number of files that has been modified since last backup
stash_backup_files_unmodifiedTotal number of files that has not been changed since last backup

Repository Metrics

Following metrics are available for backup repository:

MetricUses
stash_repository_integrityResult of repository integrity check after last backup
stash_repository_size_bytesIndicates size of repository after last backup (in bytes)
stash_repository_snapshot_countIndicates number of snapshots stored in the repository
stash_repository_snapshot_cleanedIndicates number of old snapshots cleaned up according to retention policy on last backup session

Restore Metrics

Following metrics are available for restore process:

MetricUses
stash_restore_session_successResult of repository integrity check after last backup
stash_restore_session_duration_total_secondsIndicates size of repository after last backup (in bytes)

Operator Metrics

Following metrics are available for Stash operator. These metrics are accessible through api endpoint of stash-operator service.

API Server Metrics:

Metric NameUses
apiserver_audit_event_totalCounter of audit events generated and sent to the audit backend.
apiserver_client_certificate_expiration_secondsDistribution of the remaining lifetime on the certificate used to authenticate a request.
apiserver_current_inflight_requestsMaximal number of currently used inflight request limit of this apiserver per request kind in last second.
apiserver_request_countCounter of apiserver requests broken out for each verb, API resource, client, and HTTP response contentType and code.
apiserver_request_latenciesResponse latency distribution in microseconds for each verb, resource and subresource.
apiserver_request_latencies_summaryResponse latency summary in microseconds for each verb, resource and subresource.
authenticated_user_requestsCounter of authenticated requests broken out by username.

Go Metrics:

Metric NameUses
go_gc_duration_secondsA summary of the GC invocation durations.
go_goroutinesNumber of goroutines that currently exist.
go_memstats_alloc_bytesNumber of bytes allocated and still in use.
go_memstats_alloc_bytes_totalTotal number of bytes allocated, even if freed.
go_memstats_buck_hash_sys_bytesNumber of bytes used by the profiling bucket hash table.
go_memstats_frees_totalTotal number of frees.
go_memstats_gc_sys_bytesNumber of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytesNumber of heap bytes allocated and still in use.
go_memstats_heap_idle_bytesNumber of heap bytes waiting to be used.
go_memstats_heap_inuse_bytesNumber of heap bytes that are in use.
go_memstats_heap_objectsNumber of allocated objects.
go_memstats_heap_released_bytes_totalTotal number of heap bytes released to OS.
go_memstats_heap_sys_bytesNumber of heap bytes obtained from system.
go_memstats_last_gc_time_secondsNumber of seconds since 1970 of last garbage collection.
go_memstats_lookups_totalTotal number of pointer lookups.
go_memstats_mallocs_totalTotal number of mallocs.
go_memstats_mcache_inuse_bytesNumber of bytes in use by mcache structures.
go_memstats_mcache_sys_bytesNumber of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytesNumber of bytes in use by mspan structures.
go_memstats_mspan_sys_bytesNumber of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytesNumber of heap bytes when next garbage collection will take place.
go_memstats_other_sys_bytesNumber of bytes used for other system allocations.
go_memstats_stack_inuse_bytesNumber of bytes in use by the stack allocator.
go_memstats_stack_sys_bytesNumber of bytes obtained from system for stack allocator.
go_memstats_sys_bytesNumber of bytes obtained by system. Sum of all system allocations.

HTTP Metrics:

MetricsUses
http_request_duration_microsecondsThe HTTP request latencies in microseconds.
http_request_size_bytesThe HTTP request sizes in bytes.
http_requests_totalTotal number of HTTP requests made.
http_response_size_bytesThe HTTP response sizes in bytes.

Process Metrics:

Metric NameUses
process_cpu_seconds_totalTotal user and system CPU time spent in seconds.
process_max_fdsMaximum number of open file descriptors.
process_open_fdsNumber of open file descriptors.
process_resident_memory_bytesResident memory size in bytes.
process_start_time_secondsStart time of the process since unix epoch in seconds.
process_virtual_memory_bytesVirtual memory size in bytes.

How to Enable Monitoring

You can enable monitoring through some flags while installing or upgrading or updating Stash via both script and Helm. You can also chose which monitoring agent to use for monitoring. Stash will configure respective resources accordingly. Here, are the list of available flags and their uses,

Script FlagHelm ValuesAcceptable ValuesDefaultUses
--monitoring-agentmonitoring.agentprometheus.io/builtin or prometheus.io/coreos-operatornoneSpecify which monitoring agent to use for monitoring Stash.
--monitoring-backupmonitoring.backuptrue or falsefalseSpecify whether to monitor Stash backup and restore.
--monitoring-operatormonitoring.operatortrue or falsefalseSpecify whether to monitor Stash operator.
--prometheus-namespacemonitoring.prometheus.namespaceany namespacesame namespace as Stash operatorSpecify the namespace where Prometheus server is running or will be deployed
--servicemonitor-labelmonitoring.serviceMonitor.labelsany labelFor Helm installation, app: <generated app name> and release: <release name>. For script installation, app: stashSpecify the labels for ServiceMonitor. Prometheus crd will select ServiceMonitor using these labels. Only usable when monitoring agent is prometheus.io/coreos-operator.

You have to provides these flags while installing or upgrading or updating Stash. Here, are examples for both script and Helm installation process are given which enable monitoring with prometheus.io/coreos-operator Prometheuse server for backup, restore and operator metrics.

Helm 3:

$ helm install stash-operator appscode/stash --version v0.9.0-rc.6 \
  --namespace kube-system \
  --set monitoring.agent=prometheus.io/coreos-operator \
  --set monitoring.backup=true \
  --set monitoring.operator=true \
  --set monitoring.prometheus.namespace=monitoring \
  --set monitoring.serviceMonitor.labels.k8s-app=prometheus

Helm 2:

$ helm install appscode/stash --name stash-operator --version v0.9.0-rc.6 \
  --namespace kube-system \
  --set monitoring.agent=prometheus.io/coreos-operator \
  --set monitoring.backup=true \
  --set monitoring.operator=true \
  --set monitoring.prometheus.namespace=monitoring \
  --set monitoring.serviceMonitor.labels.k8s-app=prometheus

YAML (with Helm 3):

$ helm template stash-operator appscode/stash --version v0.9.0-rc.6 \
  --namespace kube-system \
  --no-hooks \
  --set monitoring.agent=prometheus.io/coreos-operator \
  --set monitoring.backup=true \
  --set monitoring.operator=true \
  --set monitoring.prometheus.namespace=monitoring \
  --set monitoring.serviceMonitor.labels.k8s-app=prometheus | kubectl apply -f -

Next Steps

  • Learn how to monitor Stash using built-in Prometheus from here.
  • Learn how to monitor Stash using Prometheus operator from here.