订阅内容

As the world of technology continues to evolve at breakneck speed, the reliance on Red Hat OpenShift for container orchestration has become nothing short of ubiquitous. Organizations worldwide entrust their applications and services to the power of Kubernetes, lured by its scalability, resilience and agility. Yet, no one is immune to unexpected tragedy. In this era of digital transformation, one crucial question looms large: How well-prepared are you to weather the storm when disaster strikes your environment?

Imagine this scenario: Your container environment, the backbone of your mission-critical applications, suddenly grinds to a halt, posing significant threats to business continuity. An unforeseen event, whether a malicious attack or hardware failure, sends shockwaves through your system. In the face of chaos, every second counts. Your data, reputation and bottom line are all at stake.

This article takes a journey through the tumultuous waters of disaster recovery, exploring the key strategies and tools that Red Hat provides to help you navigate and recover from events that bring your Kubernetes infrastructure to its knees. Whether you're a seasoned Kubernetes pro or just dipping your toes into container orchestration, understanding how to prepare and recover from disasters is not an option—it's a necessity. 

Master the disaster

Depending on the criticality of the situation and how downtime impacts your business, your disaster recovery strategy may have different needs and requirements. Having standalone backups of each of your environments is helpful, but what really matters is having an end-to-end framework that covers a whole multi-site relocation process.

Red Hat Advanced Cluster Management for Kubernetes extends the value of OpenShift by serving as a management console for your Kubernetes fleet. Besides managing lifecycle, monitoring usage and handling Day 2 configurations, Red Hat Advanced Cluster Management provides the ease of application mobility across the entire fleet while providing the flexibility needed for different use cases. I'll briefly explore all options: 

  • VolSync: VolSync is an open source project that is an add-on. It makes it easy to capture time-based copies of application states across your whole fleet. Data replication can be applied to different locations, storage types and vendors.
  • Metro-DR: Offered by Red Hat Advanced Cluster Management and Red Hat OpenShift Data Foundation integrated stack, Metro-DR ensures continuity during the unavailability with no data loss. (RPO/Recovery Point Objective = 0)
  • Regional-DR: Offered by Red Hat Advanced Cluster Management and OpenShift Data Foundation integrated stack, Regional-DR ensures continuity during the unavailability, accepting some data loss in a predictable amount. (RPO/Recovery Point Objective = Minimal)
  • Third-party: Various third-party solutions can also take advantage of Red Hat Advanced Cluster Management's inventory and built-in health mechanisms of OpenShift clusters for a self-managed solution.

The solution proposed in this article uses Regional-DR, targeting minimal RPO and minimal RTO.

The anatomy of the solution

Consider an active-passive setup in this scenario, with Red Hat Advanced Cluster Management serving as the central hub for coordination. To begin, Red Hat Advanced Cluster Management oversees the primary cluster housing crucial applications, designated as "Primary." The secondary cluster, referred to as "Secondary," remains in a standby state with ample capacity to handle workloads in case of unavailability. These clusters are securely linked, benefiting from the Submariner add-ons for enhanced connectivity. 

Screenshot of a Red Hat OpenShift cluster set

The failover process within Red Hat Advanced Cluster Management operates on a per-application basis. Therefore, you can configure distinct settings for each application. Consider a database application that must remain on-premises due to regulatory requirements, even during an outage, requiring constant data synchronization and replication. Less critical front-end applications may leverage on-demand public cloud instances with a more lenient recovery point objective (RPO). You can implement these configurations regardless of the specific demands.

Screenshot of a Red Hat OPenShift Data policies screen

It's worth noting that Red Hat Advanced Cluster Management offers a user-friendly dashboard. You can conveniently view all components of your applications, such as Deployments, PersistentVolumes and more, along with their health status indicated by a green checkmark. You can modify these components, remove them, or even initiate a failover seamlessly in this interface.

Screenshot of a Red Hat OpenShift topology map

Undoubtedly, a failover operation is a highly sensitive procedure, necessitating the option for manual initiation. However, what if you intend to automate it in response to specific events or alerts? 

That’s where Event-Driven Ansible comes into play.

Event-Driven Ansible

As part of Red Hat Ansible Automation Platform, Event-Driven Ansible can watch for and take action on events, even within a Kubernetes environment.

In the framework this article discusses, Event-Driven Ansible captures a Prometheus event. In this case, it verifies whether a specific CPU threshold is met. If it is met, Ansible Automation Platform turns to its automation controller, which is responsible for running an Ansible job template that triggers an Ansible Playbook. 

This Ansible Playbook calls the Red Hat Advanced Cluster Management API and triggers the application failover from one OpenShift cluster to another. 

The following diagram illustrates the process:

Illustration of how Event-Driven Ansible watches for and takes action on events

In this setup, there's no requirement to access the Red Hat Advanced Cluster Management or Ansible Automation Platform user interfaces for manual execution. This duo will autonomously address critical situations triggered by events of your choice, whether they result from or signify issues in your environments. The remarkable aspect is that you can integrate any in-house resource, be it a Prometheus alert, a Kafka message, or a webhook, to serve as the event Event-Driven Ansible monitors. Check out the list of the Ansible source plugins to explore the full range of possibilities. 

To learn more, check out our two-part video series on business continuity with Red Hat Advanced Cluster Management and Ansible Automation Platform.


关于作者

Luiz Bernardo joined Red Hat is 2019 where he has supported and advocated for technologies like Linux containers and Kubernetes by providing meaningful engagements with the open source community and Red Hat customers. Born in Brazil and currently living in the Netherlands, Luiz is a sports lover and has a passion for dogs.

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

按频道浏览

automation icon

自动化

有关技术、团队和环境 IT 自动化的最新信息

AI icon

人工智能

平台更新使客户可以在任何地方运行人工智能工作负载

open hybrid cloud icon

开放混合云

了解我们如何利用混合云构建更灵活的未来

security icon

安全防护

有关我们如何跨环境和技术减少风险的最新信息

edge icon

边缘计算

简化边缘运维的平台更新

Infrastructure icon

基础架构

全球领先企业 Linux 平台的最新动态

application development icon

应用领域

我们针对最严峻的应用挑战的解决方案

Original series icon

原创节目

关于企业技术领域的创客和领导者们有趣的故事