New to Stash? Please start here.

BackupSession

What is BackupSession

A BackupSession is a Kubernetes CustomResourceDefinition(CRD) which represents a backup run of the respective target(s) referenced by a BackupConfiguration/BackupBatch in a Kubernetes native way.

Stash operator creates a Kubernetes CronJob according to the schedule defined in a BackupConfiguration/BackupBatch. On each backup schedule, this CronJob creates a BackupSession object. It points to the respective BackupConfiguration/BackupBatch. The controller that runs inside backup sidecar (in case of backup via jobs, it is stash operator itself) watches this BackupSession object and starts taking backup instantly.

You can also create a BackupSession object manually to trigger backup at any time.

BackupSession CRD Specification

Like any official Kubernetes resource, a BackupSession has TypeMeta, ObjectMeta and Spec , Status sections.

A sample BackupSession created for backing up a WordPress Application and it’s components' is shown below,

apiVersion: stash.appscode.com/v1beta1
kind: BackupSession
metadata:
  creationTimestamp: "2020-01-08T04:39:36Z"
  generation: 1
  labels:
    app.kubernetes.io/component: stash-backup
    app.kubernetes.io/managed-by: stash.appscode.com
    stash.appscode.com/invoker-name: deploy-backup-batch
    stash.appscode.com/invoker-type: BackupBatch
  name: deploy-backup-batch-1578458376
  namespace: demo
  ownerReferences:
  - apiVersion: stash.appscode.com/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: BackupBatch
    name: deploy-backup-batch
    uid: f5b9a1ce-238f-432a-86ac-287e2a85ef26
  resourceVersion: "7332"
  selfLink: /apis/stash.appscode.com/v1beta1/namespaces/demo/backupsessions/deploy-backup-batch-1578458376
  uid: 4bc5607b-04cd-4aeb-8f61-7dd21483ebb4
spec:
  invoker:
    apiGroup: stash.appscode.com
    kind: BackupBatch
    name: deploy-backup-batch
status:
  phase: Succeeded
  sessionDuration: 2m6.273902333s
  targets:
  - phase: Succeeded
    ref:
      kind: AppBinding
      name: sample-mysql
    stats:
    - duration: 28.449428155s
      hostname: host-0
      phase: Succeeded
      snapshots:
      - fileStats:
          modifiedFiles: 0
          newFiles: 1
          totalFiles: 1
          unmodifiedFiles: 0
        name: 597602f9
        path: dumpfile.sql
        processingTime: "0:04"
        uploaded: 3.407 MiB
    totalHosts: 1
  - phase: Succeeded
    ref:
      kind: Deployment
      name: wordpress
    stats:
    - duration: 50.781377951s
      hostname: host-0
      phase: Succeeded
      snapshots:
      - fileStats:
          modifiedFiles: 0
          newFiles: 1932
          totalFiles: 1932
          unmodifiedFiles: 0
        name: ce1c2487
        path: /var/www/html
        processingTime: "0:24"
        totalSize: 42.702 MiB
        uploaded: 42.645 MiB
    totalHosts: 1

Here, we are going to describe the various sections of a BackupSession object.

BackupSession Metadata

metadata.name

metadata.name indicates the name of the BackupSession. This name is automatically generated by respective CronJob and it follows the following pattern: <BackupConfiguration/BackupBatch name>-<creation timestamp in Unix epoch seconds>.

metadata.namespace

metadata.namespace indicates the name of the BackupSession. It is same as the namespace of respective BackupConfiguration/BackupBatch object.

metadata.labels

metadata.labels holds respective BackupConfiguration/BackupBatch kind and name as a label. Stash backup sidecar container use this label to watch only the BackupSessions of that BackupConfiguration/BackupBatch.

If you create BackupSession manually to trigger a backup instantly, make sure that you have added stash.appscode.com/invoker-type: <BackupConfiguration/BackupBatch kind> and stash.appscode.com/invoker-name: <BackupConfiguration/BackupBatch name> label to your BackupSession. Otherwise, it will not trigger backup for workloads (those resources that are backed up using sidecar).

BackupSession Spec

A BackupSession object has the following fields in the spec section:

spec.invoker

spec.invoker specifies the apiGroup, kind, and name of the respective object which is responsible for invoking this backup session.

BackupSession Status

.status section of BackupSession shows stats and progress of backup process in this session.A backup sidecar container or job updates the respective fields under .status section after it completes its task. .status section consists of the following fields:

status.phase

status.phase indicates the overall phase of the backup process for this BackupSession. status.phase will be Succeeded only if the phase of all targets are Succeeded. If any of the target fail to complete its backup, status.phase will be Failed.

status.sessionDuration

status.sessionDuration indicates the total time taken to complete backup of all targets in this session.

status.targets

status.targets field contains an array of the status of the individual target for a backup run. Each target’s status field consists of the following sub-fields:

  • totalHosts : Not every pod or replica of a target is subject of backup. Thus, we refer those entities that are subject of backup as a host. totalHosts specifies the total number of hosts of the target that will be backed up for this BackupSession. For more details on how many hosts will be backed up for which types of workload, please visit here.

  • ref : ref refers to the target whose backup stats has been presented by this array entry.

  • phase : phase indicates the backup phase of the target. phase will be Succeeded only if the phase of all hosts are Succeeded. If any of the hosts fail to complete its backup, phase will be Failed.

  • stats : stats section is an array of backup statistics about individual hosts of the target. Each host adds its statistics in this array after completing its backup process. Each stats entry consists of the following fields:

    • hostname: hostname indicates the name of the host.
    • phase: phase indicates the backup phase of this host.
    • duration: duration indicates the total time taken to complete backup for this host.
    • snapshots: Stash creates one snapshot for each targeted file paths specified in spec.target.paths field of BackupConfiguration object. The snapshots field holds statistics of each of these individual snapshots. Each snapshot statistics has the following fields:
    • name: name indicates the name of the snapshot.
    • path: path indicates the file path that was backed up in this snapshot.
    • totalSize: totalSize indicates the size of data to backup from this path.
    • uploaded: uploaded indicates the size of data that was uploaded to the backend for this snapshot. This could be much smaller than size if some data was already uploaded in the backend in previous backup sessions.
    • processingTime: processingTime indicates the time taken to process the data of the target path.
    • fileStats: fileStats field show statics of files that were backed up in this snapshot.
      • totalFiles: totalFiles shows the total number of files that were backed up in this snapshot.
      • newFiles: newFiles shows the number of new files that were backed up in this snapshot.
      • modifiedFiles: modifiedFiles shows the number of files that were modified since last backup of this directory.
      • unmodifiedFiles: unmodifiedFiles shows the number of files that haven’t changed since the last backup of this path.
      • error: error shows the reason of failure if the backup process failed for this host.

Hosts of a backup process

Stash uses two different models for backup depending on the target type. It uses sidecar model for Kubernetes workloads and job model for rest of the targets. In the sidecar model, Stash injects a sidecar inside the targeted workload and the sidecar is responsible for taking backup. In the job model, Stash launches a job to take a backup of the target.

Stash uses an identifier called host to separate the backed up data of different subject in the backed. This host identification process depends on the backup model and the target types. The backup strategy and host identification strategy for different types of target is explained below.

Kubernetes Workloads:

Stash uses sidecar model to backup Kubernetes workloads. However, not every sidecar takes backup. How many sidecars will take backup depends on the type of the workload. We can divide them into the following categories:

  • Deployment, ReplicaSet and ReplicationController: For these types of stateless workloads, all the replicas mount the same volumes. So, taking backup from only one replica is enough. In this case, Stash uses leader election to elect the leader pod. Only the sidecar of the leader pod takes backup. This leader pod is identified as host-0. The total number of hosts for these types of workload is 1.
  • StatefulSet: Every replica of a StatefulSet mounts different volumes. So, taking a backup from each replica is necessary. In this case, sidecar inside each replica takes backup. Stash identifies pod-0 as host-0, pod-1 as host-1, pod-2 as host-2 and so on. Hence, the total number of hosts for a StatefulSet is the number of replicas.
  • DaemonSet: Daemon replicas on every node may contain different data. So, taking a backup of each daemon pod is necessary. In this case, sidecar inside each daemon pod takes backup. Stash considers the individual daemon pod as a separate host and the node name where the daemon pod is running act as their host identifier. The total number of hosts for a DaemonSet is the number of daemon pod running in the cluster.

Stand-alone PVC:

Stash uses job model to backup a stand-alone PVC. Stash launches a job to backup the targeted PVC. This job is identified as host-0. In this case, the total number of host is 1.

Databases:

Stash uses job model to backup a database. Stash launches a job to backup the targeted database. In this case, the number of hosts depends on the database type.

  • Stand-alone database: For stand-alone database, the backup target is identified as host-0 and the total number of host is 1.
  • Replicated cluster: For replicated clustered database such as MongoDB ReplicaSet, all the replicas contain the same data. In this case, taking a backup of only one replica is enough. This replica is identified as host-0 and the total number of host is 1.
  • Sharded cluster: For sharded database cluster, Stash takes a backup of all shards. Hence, the number of hosts for a sharded database is the number of shards and they are identified as host-0, host-1, host-2, etc. However, the number of hosts may increase based on the database type.

VolumeSnapshot:

Stash uses job model for taking volume snapshots. Each volume is considered as different hosts and they are identified by their name. Hence, the number of total hosts for VolumeSnapshot is the number of targeted volumes. However, since VolumeSnapshot is handled by the respective CSI driver, host identifier does not play any role to separate their data.

Next Steps

  • Learn how backup of workloads data works from here.
  • Learn how backup of databases works from here.
  • Learn how backup stand alone PVC works from here.