フィードを購読する

Red Hat OpenShift APIs for Data Protection (OADP) 1.3 includes a built-in Data Mover that you can use to move Container Storage Interface (CSI) volume snapshots to a remote object store. Data Mover provides portability and durability of CSI volume snapshots by relocating snapshots into an object storage location during the backup of a stateful application. These snapshots are then available for restore following disaster scenarios. This article discusses the new changes in OADP 1.3.0, the various Data Mover components, and how they work together to complete this process.

How Data Mover has evolved

The OADP engineering team introduced the Data Mover feature in OADP 1.1.0. The first part of this journey used Volsync to move volumes off-cluster one at a time. Like many first steps in the industry, the feature worked reliably but was not performing as needed for production environments. We knew we had to move multiple volumes simultaneously and the work would have to be done in the upstream Velero project. After much deliberation and collaboration in the upstream, the OADP team completed the design for handling asynchronous operations (BIA/RIA V2) for backups and restores. This design laid the foundation for Data Mover in OADP 1.2.0 and OADP 1.3.0.

The Data Mover released in OADP 1.2.0 performed well for production workloads and averaged five times faster uploading and downloading volumes than OADP 1.1.0. Around the time OADP 1.2.0 was released, Kopia was introduced and supported in Velero. Kopia and asynchronous operations opened the door to a built-in Data Mover in Velero itself. A built-in Data Mover allows a more simplified workflow by avoiding the complexity of integrating an additional component like Volsync. A design for a built-in Data Mover was proposed and accepted in the Velero project. Thus far, Red Hat engineering has found this new design for Data Mover to be reliable, efficient, and easier to maintain for future releases of OADP. OADP 1.3.0 will bring this new design for a Data Mover to our customers as a tech preview. We expect full support of the feature in OADP 1.3.2 to be released early in 2024.

What is CSI?

One of the more vital Data Mover components is Container Storage Interface (CSI). CSI provides a layer of abstraction between container orchestration tools and storage systems such that storage vendors can develop a plugin once and have it work across multiple container orchestration systems. CSI defines an API for storage plugins to enable point-in-time snapshotting of volumes.

CSI-compliant storage plugins are now the industry standard and the preferred storage plugin type for most container orchestrators, including Kubernetes. Most of the Kubernetes "in-tree" drivers developed before CSI have a target removal date as most storage vendors continue deprecating non-CSI plugins. However, issues concerning CSI volumes still remain. Some volumes have vendor-specific requirements and can prevent proper portability and durability. The next section discusses how Data Mover works to solve this issue.

You can read more about CSI here.

Why we need Data Mover

During a backup using Velero with CSI, CSI snapshotting is performed. This snapshot is created on the storage provider where the snapshot was taken. This means the snapshot resides on the cluster for some providers, such as Red Hat OpenShift Data Foundation. Due to this poor durability, the snapshot or backup is also subjected to risk in the case of a cluster-level disaster scenario.

Improvements to Data Mover for block mode volumes and OpenShift virtualization

Previous implementations of OADP did not support the data movement of volumes defined with volumeMode: Block. We are pleased to report that the OADP 1.3 Data Mover can now successfully backup and restore volumes in Filesystem or Block Mode. By default, OpenShift Virtualization utilizes block mode volumes as persistent storage for virtual machines. The lack of support for block mode PVs limited the utility of OADP to successfully provide disaster recovery services for OpenShift Virtualization workloads. With OADP 1.3, OpenShift Virtualization customers can back up VMs, move the VM backup off the cluster, and restore their VMs as needed.

For backups, Kopia's default uploader was extended to use the StreamingFile API. Mapping the block mode volume as a device allows Kopia to access the data and copy it to the Unified Repository correctly. To restore, the block device data is copied back to the cluster via Kopia to a provisioned block device in /var/lib/kubelet/plugins by following a symbolic link to the device in /var/lib/kubelet/pods.

We want to thank our partners CloudCasa and VMware for their collaboration and contributions to enabling this feature in the upstream Velero project. Further improvements to block mode volumes are in progress to improve the feature's utility and performance. Please follow our work in the Velero Project as we improve this critical element.

Components

OADP OPERATOR

OADP is the OpenShift API for the Data Protection operator. This open source operator sets up and installs Velero on Red Hat OpenShift, allowing users to back up and restore applications. We will be installing Velero alongside the velero-plugin-for-csi plugin.

CSI PLUGIN

The collection of Velero plugins for snapshotting CSI-backed physical volume claims (PVCs) using the Kubernetes snapshot API.

Kopia

Kopia is an open source backup/restore tool designed to more quickly create encrypted snapshots of data and save the snapshots to a remote or cloud object storage of your choice, network-attached object storage or server, or local object storage on your machine, all with a strong security posture.

Unified Repository

A target storage interface that works with both Restic and Kopia.

The DataUpload and DataDownload custom resources

The DataUpload (DUCR) and DataDownload (DDCR) are Kubernetes custom resources (CRs) that act as protocols between data mover plugins and data movers.

The Data Mover (DM)

DM is a collection of modules to finish the data movement, specifically, data upload and data download. The modules may include the data mover controllers to reconcile DUCR/DDCR and the data path to transfer data.

The Velero built-in Data Mover (VBDM)

VBDM is the built-in data mover shipped along with Velero. It includes Velero data mover controllers and Velero generic data path.

The Node-Agent

Node-Agent is an existing Velero module that will be used to host VBDM.

The Exposer

Exposer exposes the snapshot/target volume as a path/device name/endpoint that is recognizable by the Velero generic data path. The Exposer may be different for different snapshot types/snapshot accesses. This isolation means only the Exposer component must be replaced to support other snapshot types/snapshot access methods.

Velero Generic Data Path (VGDP)

VGDP is the collection of modules introduced in the Unified Repository design. Velero uses these modules to finish data transmission for various purposes. It includes uploaders and the backup repository.

DataUpload (DUCR) specification

A Kubernetes CR that is the protocol between data mover plugins and data movers.

Field

Description

backupStorageLocation

BackupStorageLocation is the name of the backup storage location for the backup repository.

cancel

Cancel indicates a request to cancel the ongoing DataUpload. It can be set when the DataUpload is in the InProgress phase.

csiSnapshot

If SnapshotType is CSI, csiSnapshot provides its information.

dataMoverConfig

DataMoverConfig is for data-mover-specific configuration fields.

datamover

DataMover specifies the data mover used by the backup. If DataMover is "" or "velero", the built-in data mover will be used.

operationTimeout

operationTimeout specifies the time to wait for internal operations before returning a timeout error.

snapshotType

snapshotType is the type of the snapshot to be backed up. Currently the only valid value is CSI.

sourceNamespace

sourceNamespace is the original namespace where the volume is backed up. It is the same namespace for SourcePVC and CSI namespaced objects.

sourcePVC

sourcePVC is the name of the snapshotted PVC.

Note: For additional specification information, please see the DataUpload API reference documentation.

DataDownload (DDCR) spec

A Kubernetes CR that is the protocol between data mover plugins and data movers.

Field

Description

backupStorageLocation

BackupStorageLocation is the name of the backup storage location where the backup repository is stored.

cancel

Cancel indicates a request to cancel the ongoing DataUpload. It can be set when the DataUpload is in the InProgress phase.

dataMoverConfig

DataMoverConfig is for data-mover-specific configuration fields.

datamover

DataMover specifies the data mover to be used by the backup. If DataMover is "" or "velero", the built-in data mover will be used.

operationTimeout

OperationTimeout specifies the time to wait for internal operations before returning an error as timeout.

snapshotID

SnapshotID is the ID of the Velero backup snapshot to be restored from.

sourceNamespace

SourceNamespace is the original namespace from which the volume is backed up. It is the same namespace for SourcePVC and CSI namespaced objects.

targetVolume

TargetVolume is the information of the target PVC and PV.

Note: For additional specification information please see the DataDownload API reference documentation.

DataUpload (DUCR) and DataDownload (DDCR) status descriptions

Field

Description

New

The DUCR has been created but not yet processed by a controller.

Accepted

The object lock has been acquired for this DUCR and the elected controller is trying to expose the snapshot.

Prepared

The snapshot has been exposed and the related controller is starting to process the upload.

InProgress

The data upload is in progress.

Canceling

The data upload is being canceled.

Canceled

The data upload has been canceled.

Completed

The data upload has completed.

Failed

The data upload has failed.

Note: For additional specification information please see the DataUploadStatus and DataDownloadStatus API reference documentation.

The backup process

A user creates a backup CR with the snapshotMoveData option set to true. The velero-plugin-for-csi (based on the Asynchronous BackupItemAction/BIA V2 plugin API) creates a CSI VolumeSnapshot of the PVC included in the backup. The backup CR status will move from New to InProgress.

After creating the VolumeSnapshots, you will see one or more DataUpload CRs. You may also see some temporary objects (i.e., pods, PVCs, persistent volumes (PVs)) created in a protected (OADP operator's namespace) namespace. The temporary objects are created to assist in the data movement. The status of the DataUpload object will progress from New to Accepted to InProgress.

The CSI plugin now mounts the CSI snapshot from the Node-Agent. The DataUpload Controller then works with Kopia (the uploader) to move the object off-cluster to the Unified Repo Backup repository. The status is again reconciled and the backup CR is moved to complete.

Users can see the DataUpload objects move to a terminal status of either CompletedFailed or Canceled. Once the object is uploaded, any intermediate objects will be removed, like the VolumeSnapshot and VolumeSnapshotContents. Finally, the backup object status will be updated with its terminal status.

OADP 1.3 DataMover backup workflow

Below is a more in-depth visualization of the backup workflow with Data Mover.

OADP 1.3 data-mover- backup-sequence in depth

The restore process

No additional data mover options or parameters are required when a user creates a restore CR. Velero's CSI plugin (based on the RIA V2 plugin API) creates a PV and PVC in the protected namespace (the OADP operator's namespace). The restore CR status will move from New to InProgress.

The data from the backup is queried from the remotely stored DataUpload CR and written to the in-cluster ConfigMaps. A ConfigMap is created for each PV to be restored. These ConfigMaps are temporary objects deleted upon the restore's workflow completion. The ConfigMap stores vital information, such as the Repo Snapshot ID or VolumeSnapshotContent name. The data stored in the ConfigMap is used to build the DataDownload CR specification.

The CSI plugin creates the DataDownload CR, and the DataDownload Controller reconciles on the CR. The Node-Agent begins the download of the backed-up PV data from Amazon S3.

As the data from the backup is downloaded via DataDownload Controller through Kopia, the target volume is marked as not ready. The spec.VolumeName is set to empty ("") to prevent the volume from binding. View the download status from the DataDownload CR object as AcceptedPrepared or InProgress. Similar to the Data Mover backup process, a user may find temporary objects (i.e., pods, PVCs, PVs) created in the protected namespace (the OADP operator's namespace) during this step.

Once the DataDownload is in terminal status Completed, the target PVC should have been created in the target user namespace and is awaiting binding. The PV's claim reference is written to the target PVC in the target user namespace and the PVC will be immediately bound to the target PV.

OADP 1.3 restore workflow

Below is a more in-depth visualization of the restore workflow with Data Mover.

OADP 1.3 data-mover- restore sequence in depth

Wrap up

You can find the source of this blog post in the oadp-operator repository. The original upstream Velero design for the VBDM can be found here. Information and diagrams have been sourced directly from the design.


執筆者紹介

Scott has been at Red Hat since 2002 and a member of the OpenShift Migration Engineering team since 2019. In addition, Scott is an official maintainer of the Velero open source project which is a major component of Red Hat's OpenShift APIs for Data Protection (OADP) product.

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

チャンネル別に見る

automation icon

自動化

テクノロジー、チームおよび環境に関する IT 自動化の最新情報

AI icon

AI (人工知能)

お客様が AI ワークロードをどこでも自由に実行することを可能にするプラットフォームについてのアップデート

open hybrid cloud icon

オープン・ハイブリッドクラウド

ハイブリッドクラウドで柔軟に未来を築く方法をご確認ください。

security icon

セキュリティ

環境やテクノロジー全体に及ぶリスクを軽減する方法に関する最新情報

edge icon

エッジコンピューティング

エッジでの運用を単純化するプラットフォームのアップデート

Infrastructure icon

インフラストラクチャ

世界有数のエンタープライズ向け Linux プラットフォームの最新情報

application development icon

アプリケーション

アプリケーションの最も困難な課題に対する Red Hat ソリューションの詳細

Original series icon

オリジナル番組

エンタープライズ向けテクノロジーのメーカーやリーダーによるストーリー