피드 구독

236671379-bc3fad12-427b-4ec7-b188-c7cd97d01566

In Kubernetes (K8s), the recreation of the pod is handled by the ReplicaSet or Deployment controller, depending on how the pod was created. The desired state of the pod and the actual state of the pod is continuously monitored. Suppose the actual state deviates from the desired state, for example when a pod is deleted. In that case, the controller detects the change and creates a new pod based on the configuration specified in the deployment.

But what happens when our pod hosts a stateful application?

In general, stateful systems tend to be more complex and harder to scale, since they need to keep track of and manage state across multiple interactions.

K8s cannot know for sure whether the original Pod is still running or not, so if a new Pod is started we can end up with a "split-brain" condition that can lead to data corruption which is the worst thing possible for anyone responsible. Let's take for example a situation where the node in the cluster that hosts our application does not respond for any reason, K8s doesn't allow the "split-brain" case to happen and waits until the original node returns, and as long as it doesn't return, the pod will not be recreated and our application will not be available! Alternatively, we can delete the pod manually and deal with this.

This is not acceptable, our application may be critical and must tolerate a failure in a single node or a single network connection. Let's see how to enable a way to get it running elsewhere.

Here is an interesting case that I experienced during a Telco project I led. In this blog, we will see a demo that simulates a 3-node bare-metal cluster architecture that is suitable for an edge cluster solution. This cluster is unique in that each node will be both a control plane and a worker role, and also a member of OpenShift Data Foundation (ODF) as a storage solution. The Telco workloads can be VMs inside pods (stateful applications) by OpenShift virtualization and can provide network functions like router, firewall, load balancer etc.

In this demo, the VM will be a Linux server.

The following diagram shows the environment setup.

236671491-72d7adc6-0981-4b6c-a032-4c13ddd7019e

NOTE: OpenShift cluster, ODF, and OpenShift Virtualization operators need to be a prerequisite.

Install NodeHealthCheck Operator

Log in to your cluster’s web console as a cluster administrator → OperatorHub → search Node Health Check → install

Node Health Check (NHC) operator identifies unhealthy nodes and uses the Self Node Remediation (SNR) operator to remediate the unhealthy nodes.

SNR operator takes an action that will cause rescheduling of the workload and deletion of the pods from the API - which will trigger the scheduler. The fencing part (i.e making sure that the node stateful workload isn't running) will be automatically rebooting unhealthy nodes. This remediation strategy minimizes downtime for stateful applications and ReadWriteOnce (RWO) volumes.

SNR operator is installed automatically once the NHC operator is installed.

When the NHC operator detects an unhealthy node, it creates a remediation CR that triggers the remediation provider. For example, the node health check triggers the SNR operator to remediate the unhealthy node. For more information, click here.

236671712-b9101553-421c-4211-8bab-be1dc2fe36af

Check the Installed operators

236671742-307e144a-ebed-4e0a-96e0-717e18f5009b

oc get pod -n openshift-operators

236671796-538bdd1c-8a77-4443-b826-0a5fbd228f51

NOTE: In the latest version of the SNR operator, adapted to OpenShift 4.12, there is support for also for nodes with the control-plan role

NodeHealthCheck Configuration

Navigating to operators -> Installed operators -> NodeHealthCheck -> NodeHealthChecks, and Configure the NHC operator

An example of NHC configuration:

apiVersion: remediation.medik8s.io/v1alpha1
kind: NodeHealthCheck
metadata:
name: nhc-master-test
spec:
minHealthy: 51%
remediationTemplate:
apiVersion: self-node-remediation.medik8s.io/v1alpha1
kind: SelfNodeRemediationTemplate
name: self-node-remediation-resource-deletion-template
namespace: openshift-operators
selector:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: Exists
unhealthyConditions:
- duration: 150s
status: 'False'
type: Ready
- duration: 150s
status: Unknown
type: Ready

NOTE: In the 2.5.23 NHC version, a new UI option was added which enables the configuration creation

Create VM with OpenShift Virtualization

Once the OpenShift Virtualization operator is installed the virtualization option appeared, and virtual machines can be created.

236671897-97ba49e5-9476-4a9e-87d5-b6c961e9b3d3

In this case, I chose to use RHEL 8 operating system from the template catalog

236671930-bc314114-3ac8-4f3d-91de-6f82ab293e41

236671944-9bb6a135-79d0-441a-a6de-8bf72dbf5e4b

After a few minutes, the machine will be ready with the status “Running”.

236671966-09a8b888-e4d6-43b7-9136-763da03fca43

VM access by the console

Create a test file in our VM for the next steps

236671990-a8e88057-3334-4ba8-9e4d-3d4149c8dad9

236672017-86fe8aa0-f1ac-425b-bd68-dcdea4f61248

Check which node hosts the VM

oc get vmi -A

236672049-ebaa506e-ee5b-4f05-9b18-956c3e363350

NOTE: We can see the node master0-0 hosts the VM

Self Node Remediation (SNR) Operator

We can see that there is currently no managed ‘Self Node Remediation’ in the operator

236672093-606a9634-1941-4ce3-8033-98ed9ff86363

Disaster in Master0-0! The Node Hosts

Let us shut down the node (master0-0) that hosts the VM, to simulate a disaster. NOTE: shut down the node can be done according to the infrastructure, a virtual interface or BM

oc get node

236672134-00b5a8de-9663-450b-9624-dca4fe38545d

Node Health Check (NHC) detects an unhealthy node

NHC detects the unhealthy node (NotReady status) automaticly and changes the status to ‘NotReady,SchedulingDisabled’

236672161-7dc0381e-b208-42c1-8d2b-782ed94229f6

NHC creates a remediation CR that triggers the remediation provider.

Triggers the SNR operator

SNR operator remediates the unhealthy node. Self Node Remediation created automatically with the unhealthy node name (master-0-0)

236672218-f6b17796-b7c8-4060-b7d3-bc04070627e2 (1)

Check the Virtual Machine Status

The virt launcher pod is running libvirt and qemu that implement the virtual machine. SNR is deleting it now so it can be recreated by the scheduler elsewhere. This is safe because SNR made sure that the workload isn’t running on the non-responsive node by now.

236672264-183e0086-7a87-42e6-88ad-610e64d901b6

When the virt launcher pod status will be ‘ContainerCreating’, the VM status will be ‘Starting’

236672308-7f2f1676-c78e-4509-be4a-624b6a1b9081

Once the virt launcher pod status will be ‘Running’, the VM is ‘Running’ on another node.

236672323-2539eb92-7748-4c5e-b3f5-99b256b2ad79

236672336-01291e95-c0da-4040-9195-1c6ea8f54006

NOTE: We can see the node master-0-1 hosts the VM

Simulate Transferring Data to the New Host

Let's verify that the file we created in the virtual machine exists

236672017-86fe8aa0-f1ac-425b-bd68-dcdea4f61248

Conclusion

Ensuring minimal downtime for our applications is essential, regardless of whether they are stateful applications, and regardless of whether they are running on a worker, master (when it's scheduling), or a 3-node edge cluster.

We were able to notify Openshift that the disconnected node is certainly not active at the moment even if the node was with a control plane role. Thanks to this guarantee, we caused the automatic recreating of our stateful application (VM) on another node in the cluster.


저자 소개

Almog Elfassy is a highly skilled Cloud Architect at Red Hat, leading strategic projects across diverse sectors. He brings a wealth of experience in designing and implementing cutting-edge distributed cloud services. His expertise spans a cloud-native approach, hands-on experience with various cloud services, network configurations, AI, data management, Kubernetes and more. Almog is particularly interested in understanding how businesses leverage Red Hat OpenShift and technologies to solve complex problems.

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

채널별 검색

automation icon

오토메이션

기술, 팀, 인프라를 위한 IT 자동화 최신 동향

AI icon

인공지능

고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트

open hybrid cloud icon

오픈 하이브리드 클라우드

하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요

security icon

보안

환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보

edge icon

엣지 컴퓨팅

엣지에서의 운영을 단순화하는 플랫폼 업데이트

Infrastructure icon

인프라

세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보

application development icon

애플리케이션

복잡한 애플리케이션에 대한 솔루션 더 보기

Original series icon

오리지널 쇼

엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리