Filtering Files During Backup/Restore

This guide will show you how to exclude/include subset of files during backup/restore process.

Before You Begin

  • At first, you need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. If you do not already have a cluster, you can create one by using kind.
  • Install Stash in your cluster following the steps here.
  • Install Stash kubectl plugin following the steps here.
  • You should be familiar with the following Stash concepts:

To keep everything isolated, we are going to use a separate namespace called demo throughout this tutorial.

❯ kubectl create ns demo
namespace/demo created

Prepare Workload

At first, we are going to create a PVC then we are going to create a Deployment that will use this PVC.

Create PVC

Below is the YAML of the sample PVC that we are going to create,

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: source-data
  namespace: demo
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: standard
  resources:
    requests:
      storage: 2Gi

Let’s create the PVC we have shown above,

❯ kubectl apply -f https://github.com/stashed/docs/raw/v2022.07.09/docs/guides/use-cases/exclude-include-files/examples/pvc-source.yaml
persistentvolumeclaim/source-data created

Create Deployment

Now, we are going to deploy a Deployment that uses the above PVC. Below is the YAML of the Deployment that we are going to create,

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: stash-demo
  name: stash-demo
  namespace: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: stash-demo
  template:
    metadata:
      labels:
        app: stash-demo
      name: busybox
    spec:
      containers:
      - args:
        - sleep
        - "3600"
        image: busybox
        imagePullPolicy: IfNotPresent
        name: busybox
        volumeMounts:
        - mountPath: /source/data
          name: source-data
      restartPolicy: Always
      volumes:
      - name: source-data
        persistentVolumeClaim:
          claimName: source-data

Let’s create the Deployment we have shown above,

❯ kubectl apply -f https://github.com/stashed/docs/raw/v2022.07.09/docs/guides/use-cases/exclude-include-files/examples/deployment-source.yaml
deployment.apps/stash-demo created

Now, wait for the pod of the Deployment to go into Running state.

❯ kubectl get pod -n demo
NAME                          READY   STATUS    RESTARTS   AGE
stash-demo-67576d874-2tj9d    1/1     Running   0          81s

Insert Data

# create sample data
❯ kubectl exec -n demo -it  stash-demo-67576d874-2tj9d -- /bin/sh -c "touch /source/data/data-1.txt"

❯ kubectl exec -n demo -it  stash-demo-67576d874-2tj9d -- /bin/sh -c "touch /source/data/data-2.txt"

❯ kubectl exec -n demo -it  stash-demo-67576d874-2tj9d -- /bin/sh -c "touch /source/data/not-important.txt"

❯ kubectl exec -n demo -it  stash-demo-67576d874-2tj9d -- /bin/sh -c "touch /source/data/index.html"

❯ kubectl exec -n demo -it  stash-demo-67576d874-2tj9d -- /bin/sh -c "touch /source/data/resp.json"

❯ kubectl exec -n demo -it  stash-demo-67576d874-2tj9d -- /bin/sh -c "mkdir /source/data/tmp"

❯ kubectl exec -n demo -it  stash-demo-67576d874-2tj9d -- /bin/sh -c "touch /source/data/tmp/tmp.txt"

Prepare Backend

We are going to store our backed up data into a GCS bucket. We have to create a Secret with necessary credentials and a Repository crd to use this backend. If you want to use a different backend, please read the respective backend configuration doc from here.

For GCS backend, if the bucket does not exist, Stash needs Storage Object Admin role permissions to create the bucket. For more details, please check the following guide.

Create Secret

Let’s create a secret called gcs-secret with access credentials to our desired GCS bucket,

$ echo -n 'changeit' > RESTIC_PASSWORD
$ echo -n '<your-project-id>' > GOOGLE_PROJECT_ID
$ cat /path/to/downloaded-sa-key.json > GOOGLE_SERVICE_ACCOUNT_JSON_KEY
$ kubectl create secret generic -n demo gcs-secret \
    --from-file=./RESTIC_PASSWORD \
    --from-file=./GOOGLE_PROJECT_ID \
    --from-file=./GOOGLE_SERVICE_ACCOUNT_JSON_KEY
secret/gcs-secret created

Full Backup

Stash automatically takes backup of all the data in the specified path. We are going to create a BackupConfiguration crd targeting the stash-demo Deployment that we have deployed earlier. Then, Stash will inject a sidecar container into the target. It will also create a CronJob to take periodic backup of /source/data directory of the target.

We are going to use the following Repository to backup our data,

apiVersion: stash.appscode.com/v1alpha1
kind: Repository
metadata:
  name: gcs-repo-full
  namespace: demo
spec:
  backend:
    gcs:
      bucket: stash-testing
      prefix: /demo/stash-demo/full
    storageSecretName: gcs-secret

Let’s create the Repository,

❯ kubectl apply -f https://github.com/stashed/docs/raw/v2022.07.09/docs/guides/use-cases/exclude-include-files/examples/repository-full.yaml
repository.stash.appscode.com/gcs-repo-full created

Bellow is the yaml of the BackupConfiguration we are going to create,

apiVersion: stash.appscode.com/v1beta1
kind: BackupConfiguration
metadata:
  name: deployment-backup-full
  namespace: demo
spec:
  repository:
    name: gcs-repo-full
  schedule: "*/5 * * * *"
  target:
    ref:
      apiVersion: apps/v1
      kind: Deployment
      name: stash-demo
    volumeMounts:
    - name: source-data
      mountPath: /source/data
    paths:
    - /source/data
  retentionPolicy:
    name: 'keep-last-5'
    keepLast: 5
    prune: true

The above BackupConfiguration will backup everything inside the /source/data directory.

Let’s create the BackupConfiguration object that we have shown above,

❯ kubectl apply -f https://github.com/stashed/docs/raw/v2022.07.09/docs/guides/use-cases/exclude-include-files/examples/backupconfiguration-full.yaml
backupconfiguration.stash.appscode.com/deployment-backup-full created

If everything goes well, the phase of the BackupConfiguration should be Ready. The Ready phase indicates that the backup setup is successful.

Trigger a Backup

Lets trigger a backup using Stash kubectl plugin,

❯ kubectl stash trigger -n demo deployment-backup-full

Wait for BackupSession to Succeed

Run the following command to watch BackupSession phase,

❯ watch kubectl get -n demo backupsession -n demo
NAME                                INVOKER-TYPE          INVOKER-NAME        PHASE       AGE
deployment-backup-full-1647347700   BackupConfiguration   deployment-backup   Succeeded   21s

Verify Backup

Lets download the latest snapshot of the Repository in /tmp/full-backup directory using Stash kubectl plugin,

❯ mkdir /tmp/full-backup
❯ kubectl stash download gcs-repo-full -n demo --destination=/tmp/full-backup --snapshots="latest"

List the files in /tmp/full-backup/latest/source/data directory to verify the backup data.

❯ ls -R /tmp/full-backup/latest/source/data

/tmp/full-backup/latest/source/data:
data-1.txt  data-2.txt  index.html  not-important.txt  resp.json  tmp/

/tmp/full-backup/latest/source/data/tmp:
tmp.txt

Filtering During Backup

In this section, we are going to show how to filter files during a backup.

Exclude Subset of Files

Here, we are going show how to exclude particular files during a backup. We can exclude a subset of files during backup using the spec.target.exclude section in the BackupConfiguration.

We are going to use the following Repository to backup our data,

apiVersion: stash.appscode.com/v1alpha1
kind: Repository
metadata:
  name: gcs-repo-exclude
  namespace: demo
spec:
  backend:
    gcs:
      bucket: stash-testing
      prefix: /demo/stash-demo/exclude
    storageSecretName: gcs-secret

Let’s create the Repository we have shown above,

❯ kubectl apply -f https://github.com/stashed/docs/raw/v2022.07.09/docs/guides/use-cases/exclude-include-files/examples/repository-exclude.yaml
repository.stash.appscode.com/gcs-repo-exclude created

Below is the YAML of the BackupConfiguration crd that we are going to create,

apiVersion: stash.appscode.com/v1beta1
kind: BackupConfiguration
metadata:
  name: deployment-backup-exclude
  namespace: demo
spec:
  repository:
    name: gcs-repo-exclude
  schedule: "*/5 * * * *"
  target:
    ref:
      apiVersion: apps/v1
      kind: Deployment
      name: stash-demo
    volumeMounts:
    - name: source-data
      mountPath: /source/data
    paths:
    - /source/data
    exclude:
    - /source/data/not-important.txt # exclude only one file
    - /source/data/*.html # exclude the files with .html extension
    - /source/data/tmp/* # exclude a directory
  retentionPolicy:
    name: 'keep-last-5'
    keepLast: 5
    prune: true

The above BackupConfiguration will backup everything inside the /source/data directory except the file source/data/not-important.txt, all the html files, and the directory source/data/tmp.

Let’s create the BackupConfiguration object that we have shown above,

❯ kubectl apply -f https://github.com/stashed/docs/raw/v2022.07.09/docs/guides/use-cases/exclude-include-files/examples/backupconfiguration-exclude.yaml
backupconfiguration.stash.appscode.com/deployment-backup-exclude created

If everything goes well, the phase of the BackupConfiguration should be Ready. The Ready phase indicates that the backup setup is successful.

Trigger a Backup

Lets trigger a backup using Stash kubectl plugin,

kubectl stash trigger -n demo deployment-backup-exclude

Wait for BackupSession to Succeed

Run the following command to watch BackupSession phase,

$ watch kubectl get -n demo backupsession -n demo
NAME                                   INVOKER-TYPE          INVOKER-NAME        PHASE       AGE
deployment-backup-exclude-1647347700   BackupConfiguration   deployment-backup   Succeeded   21s

Verify Backup

Lets download the latest snapshot of the Repository in /tmp/partial-backup directory using Stash kubectl plugin,

❯ mkdir /tmp/partial-backup
❯ kubectl stash download gcs-repo-full -n demo --destination=/tmp/partial-backup --snapshots="latest"

List the files in /tmp/partial-backup/latest/source/data directory to verify the backup data.

❯ ls -R /tmp/partial-backup/latest/source/data

/tmp/partial-backup/latest/source/data:
data-1.txt  data-2.txt  resp.json  tmp/

/tmp/partial-backup/latest/source/data/tmp:

Filtering During Restore

In this section, we are going to show how to filter files during a backup. At first, we are going to deploy a new Deployment with a PVC . Then, we are going to restore the backed up data using Stash.

Prepare Workload

Below is the YAML of the PVC that we are going to create,

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: recovered-data
  namespace: demo
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: standard
  resources:
    requests:
      storage: 1Gi

Let’s create the PVC we have shown above,

❯ kubectl apply -f https://github.com/stashed/docs/raw/v2022.07.09/docs/guides/use-cases/exclude-include-files/examples/pvc-recovered.yaml
persistentvolumeclaim/recovered-data created

Below is the YAML of the Deployment that we are going to create,

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: stash-recovered
  name: stash-recovered
  namespace: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: stash-recovered
  template:
    metadata:
      labels:
        app: stash-recovered
      name: busybox
    spec:
      containers:
      - args:
        - sleep
        - "3600"
        image: busybox
        imagePullPolicy: IfNotPresent
        name: busybox
        volumeMounts:
        - mountPath: /source/data
          name: source-data
      restartPolicy: Always
      volumes:
      - name: source-data
        persistentVolumeClaim:
          claimName: recovered-data

Let’s create the Deployment we have shown above,

❯ kubectl apply -f https://github.com/stashed/docs/raw/v2022.07.09/docs/guides/use-cases/exclude-include-files/examples/deployment-recovered.yaml
deployment.apps/stash-recovered created

Now, wait for the pod of the Deployment to go into Running state.

❯ kubectl get pod -n demo
NAME                               READY   STATUS    RESTARTS   AGE
stash-recovered-67576d874-2tj9d    1/1     Running   0          81s

Exclude Subset of Files

Here, we are going show how to exclude particular files during a restore. We can exclude a subset of files during a restore using the spec.target.rules section in the RestoreSession.

Below is the YAML of the RestoreSession crd that we are going to create,

apiVersion: stash.appscode.com/v1beta1
kind: RestoreSession
metadata:
  name: deployment-restore-exclude
  namespace: demo
spec:
  repository:
    name: gcs-repo-full
  target:
    ref:
      apiVersion: apps/v1
      kind: Deployment
      name: stash-recovered
    volumeMounts:
    - name:  source-data
      mountPath:  /source/data
    rules:
    - paths:
      - /source/data/
      exclude:
      - /source/data/not-important.txt # don't restore this file
      - /source/data/*.html # don't restore the files with .html extension
      - /source/data/tmp/* # don't restore this directory

The above RestoreSession will restore everything inside the /source/data directory except source/data/not-important.txt, all the html files, and the directory source/data/tmp.

Let’s create the RestoreSession object that we have shown above,

❯ kubectl apply -f https://github.com/stashed/docs/raw/v2022.07.09/docs/guides/use-cases/exclude-include-files/examples/restoresession-exclude.yaml
restoresession.stash.appscode.com/deployment-restore-exclude created

Verify that the files have been restored in /source/data directory using the following command,

❯ kubectl exec -n demo stash-recovered-67576d874-2tj9d -- ls -R /source/data

/source/data:
data-1.txt
data-2.txt
resp.json
tmp/

/source/data/tmp:

Restore Subset of Files

Previously we have restored the backed up data excluding specific files or directory. You can also restore only the selected files or directory during a restore.

Below is the YAML of the RestoreSession crd that we are going to create,

apiVersion: stash.appscode.com/v1beta1
kind: RestoreSession
metadata:
  name: deployment-restore-include
  namespace: demo
spec:
  repository:
    name: gcs-repo-full
  target:
    ref:
      apiVersion: apps/v1
      kind: Deployment
      name: stash-recovered
    volumeMounts:
    - name:  source-data
      mountPath:  /source/data
    rules:
    - paths:
      - /source/data/
      include:
      - /source/data/data1.txt # restore this file
      - /source/data/*.json # restore the files with .json extension
      - /source/data/tmp/* # restore this directory

Let’s create the RestoreSession object that we have shown above,

❯ kubectl apply -f https://github.com/stashed/docs/raw/v2022.07.09/docs/guides/use-cases/exclude-include-files/examples/restoresession-include.yaml
restoresession.stash.appscode.com/deployment-restore-include created

Verify that the files have been restored in /source/data directory using the following command,

❯ kubectl exec -n demo stash-recovered-7dd74d9ff7-h9t7x -- ls -R /source/data
/source/data:
resp.json
tmp

/source/data/tmp:
tmp.txt

Cleaning Up

❯ kubectl delete -n demo deployment stash-demo
❯ kubectl delete -n demo deployment stash-recovered
❯ kubectl delete -n demo backupconfiguration deployment-backup-full
❯ kubectl delete -n demo backupconfiguration deployment-backup-exclude
❯ kubectl delete -n demo restoresession deployment-restore-include
❯ kubectl delete -n demo restoresession deployment-restore-exclude
❯ kubectl delete -n demo repository gcs-repo-full
❯ kubectl delete -n demo repository gcs-repo-exclude
❯ kubectl delete -n demo secret gcs-secret
❯ kubectl delete -n demo pvc --all