You are looking at the documentation of a prior release. To read the documentation of the latest release, please
visit here.
Stash Design Overview
We are going to make a design overhaul of Stash to simplify backup and recovery process and support some most requested features. This doc will discuss what features stash is going to support and how these features may work.
We have introduced some new crd such as Function, Task etc. and made whole process more modular. This will make easy to add support for new features and the users will also be able to customize backup process. Furthermore, this will make stash resources inter-operable between different tools and even might allow to use stash resources as function in serverless concept.
We are hoping this design will graduate to GA. So, we are taking security seriously. We are going to make sure that nobody can bypass clusters security using Stash. This might requires to remove some existing features (for example, restore from different namespace). However, we are going to provide an alternate way to cover those use cases.
Goal
Goal of this new design to support following features:
- Schedule Backup and Restore Workload Data
- Schedule Backup and Restore PVC
- Schedule Backup and Restore Database
- Schedule Backup Cluster YAMLs
- Trigger Backup Instantly
- Default Backup
- Auto Restore
- Stash cli/kubectl-plugin
- Function
- Task
Schedule Backup and Restore Workload Data
Backup Workload Data
User will be able to backup data from a running workload.
What user have to do?
- Create a
Repositorycrd. - Create a
BackupConfigurationcrd pointing to targeted workload.
Sample Repository crd:
apiVersion: stash.appscode.com/v1alpha2
kind: Repository
metadata:
name: stash-backup-repo
namespace: demo
spec:
backend:
gcs:
bucket: stash-backup-repo
prefix: default/deployment/stash-demo
storageSecretName: gcs-secret
Sample BackupConfiguration crd:
apiVersion: stash.appscode.com/v1beta1
kind: BackupConfiguration
metadata:
name: workload-data-backup
namespace: demo
spec:
schedule: '@every 1h'
# <no backupProcedure required for sidecar model>
# repository refers to the Repository crd that hold backend information
repository:
name: stash-backup-repo
# target indicate the target workload that we want to backup
target:
ref:
apiVersion: apps/v1
kind: Deployment
name: stash-demo
# directories indicates the directories inside the workload we want to backup
directories:
- /source/data
# retentionPolicies specify the policy to follow to clean old backup snapshots
retentionPolicy:
keepLast: 5
prune: true
How it will work?
- Stash will watch for
BackupCofigurationcrd. When it will find aBackupConfigurationcrd, it will inject asidecarcontainer to the workload and start acronfor scheduled backup. - In each schedule, the
cronwill createBackupSessioncrd. - The
sidecarcontainer watches forBackupSessioncrd. If it finds one, it will take backup instantly and updateBackupSessionstatus accordingly.
Sample BackupSession crd:
apiVersion: stash.appscode.com/v1beta1
kind: BackupSession
metadata:
name: demo-volume-backup-session
namespace: demo
spec:
# backupConfiguration indicates the BackupConfiguration crd of respective target that we want to backup
backupConfiguration:
name: backup-volume-demo
status:
observedGeneration: 239844#2
phase: Succeed
stats:
- direcotry: /source/data
snapshot: 40dc1520
totalSize: 1.720 GiB
uploaded: 1.200 GiB # upload size can be smaller than original file size if there are some duplicate files
fileStats:
new: 5307
changed: 0
unmodified: 0
Restore Workload Data
User will be able to restore backed up data either into a separate volume or into the same workload from where the backup was taken. Here, is an example for recovering into same workload.
What user have to do?
- Create a
RestoreSessioncrd pointingtargetfield to the workload.
Sample RestoreSession crd to restore into same workload:
apiVersion: stash.appscode.com/v1beta1
kind: RestoreSession
metadata:
name: recovery-database-demo
namespace: demo
spec:
repository:
name: stash-backup-repo
target: # target indicates where the recovered data will be stored
ref:
apiVersion: apps/v1
kind: Deployment
name: stash-demo
directories: # indicates which directories will be recovered
- /source/data
How it will work?
- When Stash will find a
RestoreSessioncrd created to restore into a workload, it will inject ainit-containerto the targeted workload. - Then, it will restart the workload.
- The
init-containerwill restore data inside the workload.
Warning: Restore in same workload require to restart the workload. So, there will be downtime of the workload.
Schedule Backup and Restore PVC
Backup PVC
User will be also able to backup stand-alone pvc. This is useful for ReadOnlyMany or ReadWriteMany type pvc.
What user have to do?
Create a
Repositorycrd for respective backend.Create a
BackupConfigurationcrd pointingtargetfield to the volume.
Sample BackupConfiguration crd to backup a PVC:
apiVersion: stash.appscode.com/v1beta1
kind: BackupConfiguration
metadata:
name: volume-backup-demo
namespace: demo
spec:
schedule: '@every 1h'
# task indicates Task crd that specifies the steps to backup a volume.
# stash will create some default Task crd while install to backup/restore various resources.
# user can also crate their own Task to customize backup/recovery
task:
name: volumeBackup
# repository refers to the Repository crd that hold backend information
repository:
name: stash-backup-repo
# target indicate the target workload that we want to backup
target:
ref:
apiVersion: v1
kind: PersistentVolumeClaim
name: demo-pvc
mountPath: /source/data
# retentionPolicies specify the policy to follow to clean old backup snapshots
retentionPolicy:
keepLast: 5
prune: true
How it will work?
- Stash will create a
CronJobusing information of respectiveTaskcrd specified bytaskfield. - The
CronJobwill take periodic backup of the target volume.
Restore PVC
User will be able to restore backed up data into a volume.
What user have to do?
- Create a
RestoreSessioncrd pointingtargetfield to the target volume where the recovered data will be stored.
Sample RestoreSession crd to restore into a volume:
apiVersion: stash.appscode.com/v1beta1
kind: RestoreSession
metadata:
name: recovery-volume-demo
namespace: demo
spec:
repository:
name: stash-backup-repo
# task indicates Task crd that specifies steps to restore a volume
task:
name: volumeRecovery
target: # target indicates where the recovered data will be stored
ref:
apiVersion: v1
kind: PersistentVolumeClaim
name: demo-pvc
mountPath: /source/data
directories: # indicates which directories will be recovered
- /source/data
How it will work?
- When Stash will find a
RestoreSessioncrd created to restore into a volume, it will launch a Job to restore into that volume. - The recovery Job will restore and store recovered data to the specified volume.
Schedule Backup and Restore Database
Backup Database
User will be able to backup database using Stash.
What user have to do?
- Create a
Repositorycrd for respective backend. - Create an
AppBindingcrd which holds connection information for the database. If the database is deployed with KubeDB,AppBindingcrd will be created automatically for each database. - Create a
BackupConfigurationcrd pointing to theAppBindingcrd.
Sample AppBinding crd:
apiVersion: appcatalog.appscode.com/v1alpha1
kind: AppBinding
metadata:
name: quick-postgres
namespace: demo
labels:
app.kubernetes.io/name: postgreses.kubedb.com
app.kubernetes.io/instance: quick-postgres
spec:
clientConfig:
insecureSkipTLSVerify: true
service:
name: quick-postgres
port: 5432
scheme: "http"
secret:
name: quick-postgres-auth
type: kubedb.com/postgres
Sample BackupConfiguration crd for database backup:
apiVersion: stash.appscode.com/v1beta1
kind: BackupConfiguration
metadata:
name: database-backup-demo
namespace: demo
spec:
schedule: '@every 1h'
# task indicates Task crd that specifies the steps to backup postgres database
task:
name: pgBackup
inputs:
database: my-postgres # specify this field if you want to backup a particular database.
# repository refers to the Repository crd that hold backend information
repository:
name: stash-backup-repo
# target indicates the respective AppBinding crd for target database
target:
ref:
apiVersion: appcatalog.appscode.com/v1alpha1
kind: AppBinding
name: quick-postgres
# retentionPolicies specify the policy to follow to clean old backup snapshots
retentionPolicy:
keepLast: 5
prune: true
How it will work?
- When Stash will see a
BackupConfigurationcrd for database backup, it will launch aCronJobto take periodic backup of this database.
Restore Database
User will be able to initialize a database from backed up snapshot.
What user have to do?
- Create a
RestoreSessioncrd withtargetfield pointing to respectiveAppBindingcrd of the target database.
Sample RestoreSession crd to restore database:
apiVersion: stash.appscode.com/v1beta1
kind: RestoreSession
metadata:
name: database-recovery-demo
namespace: demo
spec:
repository:
name: stash-backup-repo
# task indicates Task crd that specifies the steps to restore Postgres database
task:
name: pgRecovery
target: # target indicates where to restore
# indicates the respective AppBinding crd for target database that we want to initialize from backup
ref:
apiVersion: appcatalog.appscode.com/v1alpha1
kind: AppBinding
name: quick-postgres
How it will work?:
- Stash will launch a Job to restore the backed up database and initialize target with this recovered data.
Schedule Backup Cluster YAMLs
User will be able to backup yaml of the cluster resources. However, currently stash will not provide automatic restore cluster from the YAMLs. So, user will have to create them manually.
In future, Stash might be able to backup and restore not only YAMLs but also entire cluster.
What user have to do?
- Create a
Repositorycrd for respective backend. - Create a
BackupConfigurationcrd withtaskfield point to aTaskcrd that backup cluster.
Sample BackupConfiguration crd to backup YAMLs of cluster resources:
apiVersion: stash.appscode.com/v1beta1
kind: BackupConfiguration
metadata:
name: cluster-backup-demo
namespace: demo
spec:
schedule: '@every 1h'
# task indicates Task crd that specifies the steps of backup cluster yamls
task:
name: clusterBackup
# repository refers to the Repository crd that hold backend information
repository:
name: stash-backup-repo
# <no target required for cluster backup>
# retentionPolicies specify the policy to follow to clean old backup snapshots
retentionPolicy:
keepLast: 5
prune: true
How it will work?
- Stash will launch a
CronJobusing informations of theTaskcrd specified throughtaskfiled. - The
CronJobwill take periodic backup of the cluster.
Trigger Backup Instantly
User will be able to trigger a scheduled backup instantly.
What user have to do?
- Create a
BackupSessioncrd pointing to the targetBackupConfigurationcrd.
Sample BackupSession crd for triggering instant backup:
apiVersion: stash.appscode.com/v1beta1
kind: BackupSession
metadata:
name: demo-volume-backup-session
namespace: demo
spec:
# backupConfiguration indicates the BackupConfiguration crd of respective target that we want to backup
backupConfiguration:
name: volume-backup-demo
How it will work?
- For scheduled backup through
sidecarcontainer, thesidecarcontainer will take instant backup as it watches forBackupSessioncrd. - For scheduled backup through
CronJob, Stash will launch another job to take instant backup of the target.
Default Backup
User will also be able to configure a default backup for the cluster. So, user will no longer need to create Repository and BackupConfiguration crd for every workload he want to backup. Instead, she will need to add some annotations to the target workload.
What user have to do?
- Create a
BackupTemplatecrd which will hold backend information and backup information. - Add some annotations to the target. If the target is a database then add the annotations to respective
AppBindingcrd.
Default Backup of Workload Data
Sample BackupTemplate crd to backup workload data:
apiVersion: stash.appscode.com/v1beta1
kind: BackupTemplate
metadata:
name: workload-data-backup-template
spec:
backend:
gcs:
bucket: stash-backup-repo
prefix: ${target.namespace}/${target.name} # this prefix template will be used to initialize repository in different directory in backend.
storageSecretName: gcs-secret # users must ensure this secret is present in respective namespace
schedule: '@every 1h'
# < no task required >
retentionPolicy:
name: 'keep-last-5'
keepLast: 5
prune: true
Sample workload with annotations for default backup:
apiVersion: apps/v1
kind: Deployment
metadata:
name: stash-demo
namespace: demo
labels:
app: stash-demo
# if stash find bellow annotations, it will take backup of it.
annotations:
stash.appscode.com/backuptemplate: "workload-data-backup-template"
stash.appscode.com/targetDirectories: "[/source/data]"
spec:
replicas: 1
selector:
matchLabels:
app: stash-demo
template:
metadata:
labels:
app: stash-demo
name: busybox
spec:
containers:
- args:
- sleep
- "3600"
image: busybox
imagePullPolicy: IfNotPresent
name: busybox
volumeMounts:
- mountPath: /source/data
name: source-data
restartPolicy: Always
volumes:
- name: source-data
configMap:
name: stash-sample-data
Default Backup of a PVC
Sample BackupTemplate crd for stand-alone pvc backup:
apiVersion: stash.appscode.com/v1beta1
kind: BackupTemplate
metadata:
name: volume-backup-template
spec:
backend:
gcs:
bucket: stash-backup-repo
prefix: ${target.namespace}/${target.name} # this prefix template will be used to initialize repository in different directory in backend.
storageSecretName: gcs-secret # users must ensure this secret is present in respective namespace
schedule: '@every 1h'
task:
name: volumeBackup
retentionPolicy:
name: 'keep-last-5'
keepLast: 5
prune: true
Sample PVC with annotation for default backup:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: demo-pvc
namespace: demo
# if stash find bellow annotations, it will take backup of it.
annotations:
stash.appscode.com/backuptemplate: "volume-backup-template"
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
Default Backup of Database
Sample BackupTemplate crd for database backup:
apiVersion: stash.appscode.com/v1beta1
kind: BackupTemplate
metadata:
name: pgdb-backup-template
spec:
backend:
gcs:
bucket: stash-backup-repo
prefix: ${target.namespace}/${target.name} # this prefix template will be used to initialize repository in different directory in backend.
storageSecretName: gcs-secret # users must ensure this secret is present in respective namespace
schedule: '@every 1h'
task:
name: pgBackup
retentionPolicy:
name: 'keep-last-5'
keepLast: 5
prune: true
Sample AppBinding crd with annotations for default backup:
apiVersion: appcatalog.appscode.com/v1alpha1
kind: AppBinding
metadata:
name: quick-postgres
namespace: demo
labels:
app.kubernetes.io/name: postgreses.kubedb.com
app.kubernetes.io/instance: quick-postgres
# if stash find bellow annotations, it will take backup of it.
annotations:
stash.appscode.com/backuptemplate: "pgdb-backup-template"
spec:
clientConfig:
insecureSkipTLSVerify: true
service:
name: quick-postgres
port: 5432
scheme: "http"
secret:
name: quick-postgres-auth
type: kubedb.com/postgres
How it will work?
- Stash will watch the workloads, volume and
AppBindingcrds. When Stash will find an workload/volume/AppBinding crd with these annotations, it will create aRepositorycrd and aBackupConfigurationcrd using the information from respectiveTask. - Then, Stash will take normal backup as discussed earlier.
Auto Restore
User will be also able to configure an automatic recovery for a particular workload. Each time the workload restart, at first it will perform restore data from backup then original workload’s container will start.
What user have to do?
- User will have to provide some annotations in the workload.
Sample workload wit annotation to restore on restart:
apiVersion: apps/v1
kind: Deployment
metadata:
name: stash-demo
namespace: demo
labels:
app: stash-demo
# This annotations indicates that data should be recovered on each restart of the workload
annotations:
stash.appscode.com/restorepolicy: "OnRestart"
stash.appscode.com/repository: "demo-backup-repo"
stash.appscode.com/directories: "[/source/data]"
spec:
replicas: 1
selector:
matchLabels:
app: stash-demo
template:
metadata:
labels:
app: stash-demo
name: busybox
spec:
containers:
- args:
- sleep
- "3600"
image: busybox
imagePullPolicy: IfNotPresent
name: busybox
volumeMounts:
- mountPath: /source/data
name: source-data
restartPolicy: Always
volumes:
- name: source-data
configMap:
name: stash-sample-data
How it will work?
- When Stash will see a
RestoreSessioncrd configured for auto recovery, it will inject aninit-containerto the target. - The
init-containerwill perform recovery on each restart.
Stash cli/kubectl-plugin
We are going to provide a Stash plugin for kubectl. This will help to perform following operations:
- Restore into local machine instead of cluster (necessary for testing purpose).
- Restore into a different namespace from a repository: copy repository + secret into the desired namespace and then create
RestoreSessionobject. - Backup PV: creates matching PVC from PV (ensures that user has permission to read PV)
- Trigger instant backup.
Function
Function are independent single-containered workload specification that perform only single task. For example, pgBackup takes backup a PostgreSQL database and clusterBackup takes backup of YAMLs of cluster resources. Function crd has some variable fields with $ prefix which hast be resolved while creating respective workload. You can consider these variable fields as input for an Function.
Some example Function definition is given below:
clusterBackup
# clusterBackup function backup yamls of all resources of the cluster
apiVersion: stash.appscode.com/v1beta1
kind: Function
metadata:
name: clusterBackup
spec:
image: appscodeci/cluster-tool:v1
name: cluster-tool
args:
- backup
- --sanitize=${sanitize}
- --provider=${provider}
- --hostname=${hostname}
- --path=${repoDir}
- --output-dir=${outputDir}
- --retention-policy.policy=${policy}
- --retention-policy.value=${retentionValue}
- --metrics.enabled=${enableMetric}
- --metrics.pushgateway-url=${pushgatewayURL}
- --metrics.labels="workload-kind=${workloadKind},workload-name=${workloadName}"
volumeMounts:
- name: ${tempVolumeName}
mountPath: /tmp/restic
- name: ${storageSecretName}
mountPath: /etc/secrets/storage-secret
pgBackup
# pgBackup function backup a PostgreSQL database
apiVersion: stash.appscode.com/v1beta1
kind: Function
metadata:
name: pgBackup
spec:
image: appscodeci/postgresql-tool:v1
name: postgres-tool
args:
- backup
- --database=${databases}
- --provider=${provider}
- --hostname=${hostname}
- --path=${repoDir}
- --output-dir=${outputDir}
- --retention-policy.policy=${policy}
- --retention-policy.value=${retentionValue}
- --metrics.enabled=${enableMetric}
- --metrics.pushgateway-url=${pushgatewayURL}
- --metrics.labels="workload-kind=${workloadKind},workload-name=${workloadName}"
runtimeSettings:
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: $(databaseSecret)
key: "POSTGRES_PASSWORD"
- name: DB_USER
valueFrom:
secretKeyRef:
name: $(databaseSecret)
key: "POSTGRES_USER"
- name: DB_HOST
value: $(host)
volumeMounts:
- name: ${tempVolumeName}
mountPath: /tmp/restic
- name: ${storageSecretName}
mountPath: /etc/secrets/storage-secret
pgRecovery
# pgRecovery function restore a PostgreSQL database
apiVersion: stash.appscode.com/v1beta1
kind: Function
metadata:
name: pgRecovery
spec:
image: appscodeci/postgresql-tool:v1
name: postgres-tool
args:
- restore
- --provider=${provider}
- --hostname=${hostname}
- --path=${repoDir}
- --output-dir=${outputDir}
- --metrics.enabled=${enableMetric}
- --metrics.pushgateway-url=${pushgatewayURL}
- --metrics.labels="workload-kind=${workloadKind},workload-name=${workloadName}"
runtimeSettings:
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: $(databaseSecret)
key: "POSTGRES_PASSWORD"
- name: DB_USER
valueFrom:
secretKeyRef:
name: $(databaseSecret)
key: "POSTGRES_USER"
- name: DB_HOST
value: $(host)
volumeMounts:
- name: ${tempVolumeName}
mountPath: /tmp/restic
- name: ${storageSecretName}
mountPath: /etc/secrets/storage-secret
stashPostBackup
# stashPostBackup update Repository and BackupSession status for respective backup
apiVersion: stash.appscode.com/v1beta1
kind: Function
metadata:
name: stashPostBackup
spec:
image: appscode/stash:0.9.0
name: stash-post-backup
args:
- post-backup-update
- --repository=${repoName}
- --backupsession=${backupSessionName}
- --output-json-dir=${outputJsonDir}
volumeMounts:
- name: ${outputVolumeName}
mountPath: /tmp/restic
stashPostRecovery
# stashPostRecovery update RestoreSession status for respective recovery
apiVersion: stash.appscode.com/v1beta1
kind: Function
metadata:
name: stashPostRecovery
spec:
image: appscode/stash:0.9.0
name: stash-post-recovery
args:
- post-recovery-update
- --recoveryconfiguration=${recoveryConfigurationName}
- --output-json-dir=${outputJsonDir}
volumeMounts:
- name: ${outputVolumeName}
mountPath: /tmp/restic
Task
A complete backup process may need to perform multiple function. For example, if you want to backup a PostgreSQL database, we need to initialize a Repository, then backup the database and finally update Repository and BackupSession status to inform backup is completed or push backup metrics to a pushgateway . Task specifies these functions sequentially along with their inputs.
We have chosen to break complete backup process into several independent steps so that those individual functions can be used with other tool than Stash. It also make easy to add support for new feature. For example, to add support new database backup, we are going to just require to add a Function and Task crd. We are going to no longer need change anything in Stash operator code. This will also helps users to backup databases that are not officially supported by stash.
Some sample Task is given below:
pgBackup
# pgBackup specifies required functions and their inputs to backup PostgreSQL database
apiVersion: stash.appscode.com/v1beta1
kind: Task
metadata:
name: pgBackup
spec:
functions:
- name: pgBackup
inputs:
database: ${databases}
provider: ${provider}
hostname: ${hostname}
repoDir: ${prefix}
outputDir: ${outputDir}
policy: ${retentionPolicyName}
retentionValue: ${retentionPolicyValue}
enableMetric: ${enableMetric}
pushgatewayURL: ${pushgatewayURL}
workloadKind: ${kind}
workloadName: ${name}
tempVolumeName: ${tmpVolumeName}
storageSecretName: ${secretName}
- name: stashPostBackup
inputs:
repoName: ${repoName}
backupSession: ${backupSessionName}
outputJsonDir: ${output-dir}
outputVolumeName: ${output-volume-name}
pgRecovery
# pgRecovery specifies required functions and their inputs to restore PostgreSQL database
apiVersion: stash.appscode.com/v1beta1
kind: Task
metadata:
name: pgRecovery
spec:
functions:
- name: pgRecovery
inputs:
provider: ${provider}
hostname: ${hostname}
repoDir: ${prefix}
outputDir: ${outputDir}
enableMetric: ${enableMetric}
pushgatewayURL: ${pushgatewayURL}
workloadKind: ${kind}
workloadName: ${name}
tempVolumeName: ${tmpVolumeName}
storageSecretName: ${secretName}
- name: stashPostRecovery
inputs:
recoveryConfigurationName: ${recoveryConfigurationName}
outputJsonDir: ${output-dir}
outputVolumeName: ${output-volume-name}
clusterBackup
# clusterBackup specifies required functions and their inputs to backup cluster yaml
apiVersion: stash.appscode.com/v1beta1
kind: Task
metadata:
name: clusterBackup
spec:
functions:
- name: clusterBackup
inputs:
sanitize: ${sanitize}
provider: ${provider}
hostname: ${hostname}
repoDir: ${prefix}
outputDir: ${outputDir}
policy: ${retentionPolicyName}
retentionValue: ${retentionPolicyValue}
enableMetric: ${enableMetric}
pushgatewayURL: ${pushgatewayURL}
workloadKind: ${kind}
workloadName: ${name}
tempVolumeName: ${tmpVolumeName}
storageSecretName: ${secretName}
- name: stashPostBackup
inputs:
repoName: ${repoName}
backupSession: ${backupSessionName}
outputJsonDir: ${output-dir}
outputVolumeName: ${output-volume-name}






