New to Stash? Please start here.
BackupSession
What is BackupSession
A BackupSession
is a Kubernetes CustomResourceDefinition
(CRD) which represents a backup run of the respective target(s) referenced by a BackupConfiguration
/BackupBatch
in a Kubernetes native way.
Stash operator creates a Kubernetes CronJob
according to the schedule defined in a BackupConfiguration
/BackupBatch
. On each backup schedule, this CronJob
creates a BackupSession
object. It points to the respective BackupConfiguration
/BackupBatch
. The controller that runs inside backup sidecar (in case of backup via jobs, it is stash operator itself) watches this BackupSession
object and starts taking backup instantly.
You can also create a BackupSession
object manually to trigger backup at any time.
BackupSession CRD Specification
Like any official Kubernetes resource, a BackupSession
has TypeMeta
, ObjectMeta
and Spec
, Status
sections.
A sample BackupSession
created for backing up a WordPress Application and it’s components’ is shown below,
apiVersion: stash.appscode.com/v1beta1
kind: BackupSession
metadata:
creationTimestamp: "2020-07-25T17:41:28Z"
labels:
app: stash
stash.appscode.com/invoker-name: wordpress-backup
stash.appscode.com/invoker-type: BackupBatch
name: wordpress-backup-1578458376
namespace: demo
spec:
invoker:
apiGroup: stash.appscode.com
kind: BackupBatch
name: wordpress-backup
retryLeft: 2
status:
conditions:
- lastTransitionTime: "2020-07-25T17:41:31Z"
message: Repository exist in the backend.
reason: BackendRepositoryFound
status: "True"
type: BackendRepositoryInitialized
- lastTransitionTime: "2020-07-25T17:41:48Z"
message: Successfully applied retention policy.
reason: SuccessfullyAppliedRetentionPolicy
status: "True"
type: RetentionPolicyApplied
- lastTransitionTime: "2020-07-25T17:41:50Z"
message: Repository integrity verification succeeded.
reason: SuccessfullyVerifiedRepositoryIntegrity
status: "True"
type: RepositoryIntegrityVerified
- lastTransitionTime: "2020-07-25T17:41:50Z"
message: Successfully pushed repository metrics.
reason: SuccessfullyPushedRepositoryMetrics
status: "True"
type: RepositoryMetricsPushed
phase: Succeeded
sessionDuration: 22.575920065s
sessionDeadline: "2020-07-25T17:46:28Z"
targets:
- phase: Succeeded
preBackupActions:
- InitializeBackendRepository
ref:
apiVersion: apps/v1
kind: Deployment
name: wordpress
stats:
- duration: 831.018039ms
hostname: app
phase: Succeeded
snapshots:
- fileStats:
modifiedFiles: 0
newFiles: 1
totalFiles: 1
unmodifiedFiles: 0
name: b54ee4a0
path: /var/www/html
processingTime: "0:00"
totalSize: 0 B
uploaded: 711 B
totalHosts: 1
- phase: Succeeded
postBackupActions:
- ApplyRetentionPolicy
- VerifyRepositoryIntegrity
- SendRepositoryMetrics
ref:
apiVersion: appcatalog.appscode.com/v1alpha1
kind: AppBinding
name: wordpress-mysql
stats:
- duration: 1.147010638s
hostname: db
phase: Succeeded
snapshots:
- fileStats:
modifiedFiles: 0
newFiles: 1
totalFiles: 1
unmodifiedFiles: 0
name: b30beb44
path: dumpfile.sql
processingTime: "0:00"
totalSize: 0 B
uploaded: 3.408 MiB
totalHosts: 1
Here, we are going to describe the various sections of a BackupSession
object.
BackupSession Metadata
metadata.name
metadata.name
indicates the name of the BackupSession
. This name is automatically generated by respective CronJob
and it follows the following pattern: <BackupConfiguration/BackupBatch name>-<creation timestamp in Unix epoch seconds>
.
metadata.namespace
metadata.namespace
indicates the name of the BackupSession
. It is the same as the namespace of respective BackupConfiguration
/BackupBatch
object.
metadata.labels
metadata.labels
holds respective BackupConfiguration
/BackupBatch
kind and name as a label. The stash backup sidecar container use this label to watch only the BackupSessions of that BackupConfiguration
/BackupBatch
.
If you create
BackupSession
manually to trigger a backup instantly, make sure that you have addedstash.appscode.com/invoker-type: <BackupConfiguration/BackupBatch kind>
andstash.appscode.com/invoker-name: <BackupConfiguration/BackupBatch name>
label to yourBackupSession
. Otherwise, it will not trigger backup for workloads (those resources that are backed up using sidecar).
BackupSession Spec
A BackupSession
object has the following fields in the spec
section:
spec.invoker
spec.invoker
specifies the apiGroup
, kind
, and name
of the respective object which is responsible for invoking this backup session.
spec.retryLeft
spec.retryLeft
specifies the number of retry attempt left for this backup session.
BackupSession Status
.status
section of BackupSession
shows stats and progress of backup process in this session.A backup sidecar container or job updates the respective fields under .status
section after it completes its task. .status
section consists of the following fields:
status.phase
status.phase
indicates the overall phase of the backup process for this BackupSession. status.phase
will be Succeeded
only if the phase of all targets is Succeeded
. If any of the targets fail to complete its backup, status.phase
will be Failed
.
status.sessionDuration
status.sessionDuration
indicates the total time taken to complete the backup of all targets in this session.
status.sessionDeadline
status.sessionDeadline
indicates the the deadline of the backup process. BackupSession
will be considered Failed
if the backup does not complete within this deadline.
status.retried
status.retried
is a boolean field which specifies whether this session was retried or not in case of failed backup.
status.nextRetry
status.nextRetry
specifies the timestamp when this backup will be retried if it has failed.
status.conditions
status.conditions
shows the conditions of different operations/steps of the backup process. The following conditions are set by the Stash operator on a BackupSession.
Condition Type | Usage |
---|---|
BackendRepositoryInitialized | Indicates whether the backend repository was initialized or not. |
RetentionPolicyApplied | Indicates whether the retention policies were applied or not. |
RepositoryIntegrityVerified | Indicates whether the repository integrity check succeeded or not. |
RepositoryMetricsPushed | Indicates whether the Repository metrics for this backup session were pushed or not. |
GlobalPreBackupHookSucceeded | Indicates whether the global PreBackupHook was executed successfully or not. Only available during backup using BackupBatch. |
GlobalPostBackupHookSucceeded | Indicates whether the global PostBackupHook was executed successfully or not. Only available during backup BackupBatch. |
DeadlineExceeded | Indicates whether the session deadline was exceeded or not. |
status.targets
status.targets
field contains an array of the status of the individual target for a backup run. Each target’s status field consists of the following sub-fields:
totalHosts : Not every pod or replica of a target is subject to backup. Thus, we refer those entities that are subject to backup as a host.
totalHosts
specifies the total number of hosts of the target that will be backed up for this BackupSession. For more details on how many hosts will be backed up for which types of workload, please visit here.preBackupActions : Specifies a list of actions that the backup process should execute before taking backup. For example, the backend repository must be initialized by one of the targets before taking backup. Stash automatically assigned which target should execute this action. The
preBackupActions
should not be confused withpreBackup
hook. The hooks are meant to be configured by the users where thepreBackupActions
are meant to be configured by Stash itself.postBackupActions : Similar to
preBackupActions
, it specifies a list of actions that a backup process should execute after taking the backup. For example, when all the targets complete their backup, one target must apply retention policy into the repository. Stash automatically selects which target should execute thesepostBackupActions
.ref :
ref
refers to the target whose backup stats has been presented by this array entry.phase :
phase
indicates the backup phase of the target.phase
will beSucceeded
only if the phase of all hosts areSucceeded
. If any of the hosts fail to complete its backup,phase
will beFailed
.stats :
stats
section is an array of backup statistics about individual hosts of the target. Each host adds its statistics in this array after completing its backup process. Each stats entry consists of the following fields:- hostname:
hostname
indicates the name of the host. - phase:
phase
indicates the backup phase of this host. - duration:
duration
indicates the total time taken to complete backup for this host. - snapshots: Stash creates one snapshot for each targeted file paths specified in
spec.target.paths
field ofBackupConfiguration
object. Thesnapshots
field holds statistics of each of these individual snapshots. Each snapshot statistics has the following fields: - name:
name
indicates the name of the snapshot. - path:
path
indicates the file path that was backed up in this snapshot. - totalSize:
totalSize
indicates the size of data to backup from this path. - uploaded:
uploaded
indicates the size of the data that was uploaded to the backend for this snapshot. This could be much smaller thansize
if some data was already uploaded in the backend in previous backup sessions. - processingTime:
processingTime
indicates the time taken to process the data of the target path. - fileStats:
fileStats
field show statics of files that were backed up in this snapshot.- totalFiles:
totalFiles
shows the total number of files that were backed up in this snapshot. - newFiles:
newFiles
shows the number of new files that were backed up in this snapshot. - modifiedFiles:
modifiedFiles
shows the number of files that were modified since last backup of this directory. - unmodifiedFiles:
unmodifiedFiles
shows the number of files that haven’t changed since the last backup of this path. - error:
error
shows the reason for failure if the backup process failed for this host.
- totalFiles:
- hostname:
Hosts of a backup process
Stash uses two different models for backup depending on the target type. It uses sidecar model for Kubernetes workloads and job model for the rest of the targets. In the sidecar model, Stash injects a sidecar inside the targeted workload and the sidecar is responsible for taking backup. In the job model, Stash launches a job to take a backup of the target.
Stash uses an identifier called host to separate the backed up data of different subjects in the backed. This host identification process depends on the backup model and the target types. The backup strategy and host identification strategy for different types of the target is explained below.
Kubernetes Workloads:
Stash uses the sidecar model to backup Kubernetes workloads. However, not every sidecar takes backup. How many sidecars will take backup depends on the type of the workload. We can divide them into the following categories:
- Deployment, ReplicaSet, and ReplicationController: For these types of stateless workloads, all the replicas mount the same volumes. So, taking backup from only one replica is enough. In this case, Stash uses a leader election to elect the leader pod. Only the sidecar of the leader pod takes backup. The
alias
provided in the BackupConfiguration/BackupBatch is used as a host identifier. If thealias
was not provided, then it defaults tohost-0
. The total number of hosts for these types of workload is 1. - StatefulSet: Every replica of a StatefulSet mount different volumes. So, taking a backup from each replica is necessary. In this case, sidecar inside each replica takes backup. Stash identifies pod-0 as <alias>-0, pod-1 as <alias>-1, pod-2 as <alias>-2 and so on. If the
alias
was not provided in the BackupConfiguration/BackupBatch, then the host identifiers are generated ashost-0
,host-1
, andhost-2
etc. The total number of hosts for a StatefulSet is the number of replicas. - DaemonSet: Daemon replicas on every node may contain different data. So, taking a backup of each daemon pod is necessary. In this case, sidecar inside each daemon pod takes backup. Stash considers the individual daemon pod as a separate host and the host identifiers are generated as <alias>-<node name>. The total number of hosts for a DaemonSet is the number of daemon pod running in the cluster.
Stand-alone PVC:
Stash uses the job model to backup a stand-alone PVC. Stash launches a job to backup the targeted PVC. The alias
provided in the BackupConfiguration/BackupBatch is used as the host identifier. If the alias
was not provided, it defaults to host-0
. The total number of hosts for a stand-alone PVC backup is 1.
Databases:
Stash uses the job model to backup a database. Stash launches a job to backup the targeted database. In this case, the number of hosts depends on the database type.
- Stand-alone database: For stand-alone database, the backup target is identified by the
alias
and the total number of hosts is 1. - Replicated cluster: For replicated clustered databases such as MongoDB ReplicaSet, all the replicas contain the same data. In this case, taking a backup of only one replica is enough. This replica is identified by the
alias
and the total number of hosts is 1. - Sharded cluster: For the sharded database cluster, Stash takes a backup of all shards. Hence, the number of hosts for a sharded database is the number of shards and they are identified as <alias>-0, <alias>-1, <alias>-2, etc. However, the number of hosts may increase based on the database type.
VolumeSnapshot:
Stash uses the job model for taking volume snapshots. Each volume is considered as different hosts and they are identified by their name. Hence, the number of total hosts for VolumeSnapshot is the number of targeted volumes. However, since VolumeSnapshot is handled by the respective CSI driver, the host identifier does not play any role to separate their data.