As the world of technology continues to evolve at breakneck speed, the reliance on Red Hat OpenShift for container orchestration has become nothing short of ubiquitous. Organizations worldwide entrust their applications and services to the power of Kubernetes, lured by its scalability, resilience and agility. Yet, no one is immune to unexpected tragedy. In this era of digital transformation, one crucial question looms large: How well-prepared are you to weather the storm when disaster strikes your environment?
Imagine this scenario: Your container environment, the backbone of your mission-critical applications, suddenly grinds to a halt, posing significant threats to business continuity. An unforeseen event, whether a malicious attack or hardware failure, sends shockwaves through your system. In the face of chaos, every second counts. Your data, reputation and bottom line are all at stake.
This article takes a journey through the tumultuous waters of disaster recovery, exploring the key strategies and tools that Red Hat provides to help you navigate and recover from events that bring your Kubernetes infrastructure to its knees. Whether you're a seasoned Kubernetes pro or just dipping your toes into container orchestration, understanding how to prepare and recover from disasters is not an option—it's a necessity.
Master the disaster
Depending on the criticality of the situation and how downtime impacts your business, your disaster recovery strategy may have different needs and requirements. Having standalone backups of each of your environments is helpful, but what really matters is having an end-to-end framework that covers a whole multi-site relocation process.
Red Hat Advanced Cluster Management for Kubernetes extends the value of OpenShift by serving as a management console for your Kubernetes fleet. Besides managing lifecycle, monitoring usage and handling Day 2 configurations, Red Hat Advanced Cluster Management provides the ease of application mobility across the entire fleet while providing the flexibility needed for different use cases. I'll briefly explore all options:
- VolSync: VolSync is an open source project that is an add-on. It makes it easy to capture time-based copies of application states across your whole fleet. Data replication can be applied to different locations, storage types and vendors.
- Metro-DR: Offered by Red Hat Advanced Cluster Management and Red Hat OpenShift Data Foundation integrated stack, Metro-DR ensures continuity during the unavailability with no data loss. (RPO/Recovery Point Objective = 0)
- Regional-DR: Offered by Red Hat Advanced Cluster Management and OpenShift Data Foundation integrated stack, Regional-DR ensures continuity during the unavailability, accepting some data loss in a predictable amount. (RPO/Recovery Point Objective = Minimal)
- Third-party: Various third-party solutions can also take advantage of Red Hat Advanced Cluster Management's inventory and built-in health mechanisms of OpenShift clusters for a self-managed solution.
The solution proposed in this article uses Regional-DR, targeting minimal RPO and minimal RTO.
The anatomy of the solution
Consider an active-passive setup in this scenario, with Red Hat Advanced Cluster Management serving as the central hub for coordination. To begin, Red Hat Advanced Cluster Management oversees the primary cluster housing crucial applications, designated as "Primary." The secondary cluster, referred to as "Secondary," remains in a standby state with ample capacity to handle workloads in case of unavailability. These clusters are securely linked, benefiting from the Submariner add-ons for enhanced connectivity.
The failover process within Red Hat Advanced Cluster Management operates on a per-application basis. Therefore, you can configure distinct settings for each application. Consider a database application that must remain on-premises due to regulatory requirements, even during an outage, requiring constant data synchronization and replication. Less critical front-end applications may leverage on-demand public cloud instances with a more lenient recovery point objective (RPO). You can implement these configurations regardless of the specific demands.
It's worth noting that Red Hat Advanced Cluster Management offers a user-friendly dashboard. You can conveniently view all components of your applications, such as Deployments, PersistentVolumes and more, along with their health status indicated by a green checkmark. You can modify these components, remove them, or even initiate a failover seamlessly in this interface.
Undoubtedly, a failover operation is a highly sensitive procedure, necessitating the option for manual initiation. However, what if you intend to automate it in response to specific events or alerts?
That’s where Event-Driven Ansible comes into play.
Event-Driven Ansible
As part of Red Hat Ansible Automation Platform, Event-Driven Ansible can watch for and take action on events, even within a Kubernetes environment.
In the framework this article discusses, Event-Driven Ansible captures a Prometheus event. In this case, it verifies whether a specific CPU threshold is met. If it is met, Ansible Automation Platform turns to its automation controller, which is responsible for running an Ansible job template that triggers an Ansible Playbook.
This Ansible Playbook calls the Red Hat Advanced Cluster Management API and triggers the application failover from one OpenShift cluster to another.
The following diagram illustrates the process:
In this setup, there's no requirement to access the Red Hat Advanced Cluster Management or Ansible Automation Platform user interfaces for manual execution. This duo will autonomously address critical situations triggered by events of your choice, whether they result from or signify issues in your environments. The remarkable aspect is that you can integrate any in-house resource, be it a Prometheus alert, a Kafka message, or a webhook, to serve as the event Event-Driven Ansible monitors. Check out the list of the Ansible source plugins to explore the full range of possibilities.
To learn more, check out our two-part video series on business continuity with Red Hat Advanced Cluster Management and Ansible Automation Platform.
저자 소개
Luiz Bernardo joined Red Hat is 2019 where he has supported and advocated for technologies like Linux containers and Kubernetes by providing meaningful engagements with the open source community and Red Hat customers. Born in Brazil and currently living in the Netherlands, Luiz is a sports lover and has a passion for dogs.
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.