This post was written by: Swati Sehgal, Alexey Perevalov, Killian Muldoon & Francesco Romani
How do you get the most out of your bare-metal hardware? Believe it or not, the physical layout in a computer of the resources a workload uses, from memory and CPU to storage and I/O, can have a dramatic impact on performance. Until recently Kubernetes users had no direct way to influence this key interaction between hardware and software, commonly called Resource Topology.
This blog post series describes Topology Aware Scheduling, a feature being rolled out in Kubernetes in 2021. Topology Aware Scheduling enables the Kubernetes control plane to keep to Resource Topology constraints when placing Pods on Nodes.This approach complements Topology Manager, which was initially introduced in Kubernetes 1.17, the node-level Resource Topology enforcer in kubelet, but more on that later.
Why does resource topology matter?
Non-Uniform Memory Access (NUMA) is a compute platform architecture that allows different CPUs to access different regions of memories at different speeds. The relative locations of CPUs, memory, and PCI devices are what we’re talking about when we say Resource Topology.
This architecture has major advantages. Any CPU core can potentially access all memory on a system, but there are some potential pitfalls with performance. For example, in the diagram below, memory closer to CPU core 1 will be quicker to access by CPU core 1 than memory close to CPU core 7.
FIGURE 1: A Non-uniform Memory Access (NUMA) system
It’s straightforward so far, and the underlying operating system will manage most of this, even in a Kubernetes cluster. When you’re trying to squeeze low-latency performance from bare metal, though, you need to dedicate isolated resources to specific applications. As we add new kinds of resources, things get increasingly complicated.
For I/O-constrained workloads, the network interface on a distant NUMA zone slows down how quickly information can reach the application. High-performance workloads, like those running the 5G network, can’t operate to spec under these conditions.
Taking an example of a pod requesting 2 CPUs and a PCI device, FIGURE 2 shows a scenario where resources are not NUMA aligned whereas FIGURE 3 shows a scenario where resources are NUMA aligned:
FIGURE 2: A NUMA System with no Resource Alignment
FIGURE 3: A NUMA System with Resource Alignment
Without handling Resource Topology, Kubernetes as it exists in 1.20 can’t meet the needs of these sorts of applications. End users can (and have!) found ways around this by adding constraints to their clusters. One option is to replace bare-metal deployments with VMs, while another is to limit the pod configs available to developers.
Does Kubernetes default-scheduler consider Resource Topology when assigning pods to nodes?
Kubernetes Topology Manager allows workloads to run in an environment optimized for low latency. Performance-critical workloads require topology information to use co-located CPU cores and devices for industries like telecommunications, High Powered Computing (HPC), and Internet of Things (IoT), but the current native scheduler does not select a node based on its topology. This happens due to the scheduler’s lack of knowledge of Resource Topology, which can lead to unpredictable application performance. In general, this means under performance, and in the worst case, complete mismatch of resource requests and kubelet policies such as scheduling a pod destined to fail, potentially entering a failure loop.
Exposing cluster level topology to the scheduler empowers it to make intelligent NUMA aware placement decisions optimizing cluster wide performance of workloads.
What is the business case for enabling Topology aware scheduling in Kubernetes?
A company could make a business by providing a public cloud or by selling a cloud solution to third parties (for example, telecom operators for NFV use cases and to others). In case of public cloud, the cloud provider in its end user agreement or in public offer can provide only tariffs with a fixed number of resources. In this case, the problem of resource alignment is solved by IAAS level and by the number of resources (NIC, GPU) we can find in tariffs, and these numbers are aligned to numbers per NUMA.
Another case is when a company sells cloud solutions and clients demand more flexibility. Flexibility to them is the ability to work on bare metal and the ability to request any number and kind of resources. So the solution that makes kube scheduler topology aware is interesting for those companies who sell cloud solutions to third parties.
In the next part of the blog post, we talk about Topology Manager and explain the design of Topology aware Scheduling in more detail.
À propos de l'auteur
Parcourir par canal
Automatisation
Les dernières nouveautés en matière d'automatisation informatique pour les technologies, les équipes et les environnements
Intelligence artificielle
Actualité sur les plateformes qui permettent aux clients d'exécuter des charges de travail d'IA sur tout type d'environnement
Cloud hybride ouvert
Découvrez comment créer un avenir flexible grâce au cloud hybride
Sécurité
Les dernières actualités sur la façon dont nous réduisons les risques dans tous les environnements et technologies
Edge computing
Actualité sur les plateformes qui simplifient les opérations en périphérie
Infrastructure
Les dernières nouveautés sur la plateforme Linux d'entreprise leader au monde
Applications
À l’intérieur de nos solutions aux défis d’application les plus difficiles
Programmes originaux
Histoires passionnantes de créateurs et de leaders de technologies d'entreprise
Produits
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Services cloud
- Voir tous les produits
Outils
- Formation et certification
- Mon compte
- Assistance client
- Ressources développeurs
- Rechercher un partenaire
- Red Hat Ecosystem Catalog
- Calculateur de valeur Red Hat
- Documentation
Essayer, acheter et vendre
Communication
- Contacter le service commercial
- Contactez notre service clientèle
- Contacter le service de formation
- Réseaux sociaux
À propos de Red Hat
Premier éditeur mondial de solutions Open Source pour les entreprises, nous fournissons des technologies Linux, cloud, de conteneurs et Kubernetes. Nous proposons des solutions stables qui aident les entreprises à jongler avec les divers environnements et plateformes, du cœur du datacenter à la périphérie du réseau.
Sélectionner une langue
Red Hat legal and privacy links
- À propos de Red Hat
- Carrières
- Événements
- Bureaux
- Contacter Red Hat
- Lire le blog Red Hat
- Diversité, équité et inclusion
- Cool Stuff Store
- Red Hat Summit