Today we're unveiling the Cluster Observability Operator (COO), a new Red Hat OpenShift Operator that is designed to manage observability stacks on your clusters. Its upstream variant can be used on vanilla Kubernetes. This is more than an Operator; it’s also a testament of our commitment to delivering tightly integrated observability solutions that evolve with our customers' and users' needs.
COO is now available as a technology preview for all OpenShift users, introducing the new MonitoringStack custom resource definition (CRD) as an initial feature set, which lets you run highly available monitoring stacks consisting of Prometheus, AlertManager and Thanos Querier. Additional observability components may be added in a future release (see the “Looking forward” section below for more details).
COO complements the built-in monitoring capabilities of OpenShift and can be run in parallel with the default platform monitoring and user workload monitoring stacks managed by the Cluster Monitoring Operator (CMO).
From a product perspective, COO is a strategic enhancement reflecting our deep understanding of the evolving Kubernetes/OpenShift landscape. By incorporating the latest technological advancements, COO is designed to closely integrate with existing systems, so our customers can better stay ahead in the rapidly advancing world of cloud-native technologies.
Background
OpenShift ships with built-in monitoring capabilities by default. On an OpenShift cluster, the CMO manages two monitoring stacks:
- The platform monitoring stack, which monitors the cluster infrastructure and all OpenShift components and acts as a data source for the OpenShift Console.
- The optional user workload monitoring stack, which can be used to monitor custom workloads.
With its opinionated configuration tuned for reliability and easy operation, curated alerting rules, accessible dashboards, and simple but reliable tenancy model, the default OpenShift monitoring stack has set an industry standard and played a key role in the success of OpenShift in enterprises worldwide.
OpenShift monitoring’s design decisions and tradeoffs between supportability and flexibility in configuration all fit neatly with the most common enterprise use cases, in which small- to mid-sized clusters are deployed with ownership shared between two roles: administrators and developers.
Typically, a single OpenShift cluster in this environment is managed and used by one site reliability engineering (SRE) team that is responsible for operating the cluster infrastructure, and by multiple development teams that use the cluster and own one or more namespaces in which they run their workloads.
The SRE team can rely on the built-in platform metrics, alerts and dashboards on which to base their service level objectives (SLOs). The development teams can leverage the user workload monitoring stack for monitoring their custom workloads with the ability to restrict access to and control the visibility of metrics on a namespace level.
Recently, however, we've been seeing an ever-increasing number of customer needs that don’t fall into the standard use case described above. A few examples include:
- Very small clusters and resource-constrained environments (for example, edge use cases)
- Very large clusters with hundreds or thousands of nodes
- More complex ownership models with multiple levels of responsibility for different parts of a cluster
- More complex requirements regarding tenancy
- A large number of clusters with the requirement to observe them in a more centralized way
Additionally, we’ve begun thinking of monitoring as only one part of a complete observability story. In OpenShift, metrics, logs and traces have traditionally been set up and dealt with separately, with logs and traces being optional components in a default OpenShift installation.
If we take a more holistic approach to observability that includes all of these different signals and then correlate and present them in a unified way, we can work toward a solution for customers that will make observing large platform operations easier and help reduce complexity in both setup and use.
We have created the Cluster Observability Operator as part of this holistic approach toward addressing these customer needs and use cases.
In creating COO, our product vision was to develop a tool that not only addresses current user requirements but also anticipates future trends in cluster management and observability. This forward-thinking approach means that COO is a solution for today and a strategic asset that will continue to deliver value as customer needs and industry standards evolve.
Cluster Observability Operator explained
Cluster Observability Operator can be installed and managed on OpenShift using the Operator Lifecycle Manager from the official Red Hat channels. For other Kubernetes distributions, please refer to the upstream documentation.
After installing COO, it’s straightforward to create a MonitoringStack custom resource in your namespace that will spin up a monitoring stack with the default configuration:
apiVersion: monitoring.rhobs/v1alpha1 kind: MonitoringStack metadata: labels: coo: example name: sample-monitoring-stack namespace: coo-demo spec: logLevel: debug retention: 1d resourceSelector: matchLabels: app: demo
Under the hood, COO runs Prometheus Operator, creating a highly available Prometheus instance paired with Thanos Querier and AlertManager instances.
Using COO, you can run any number of monitoring stacks on your cluster with this approach, enabling many use cases that haven't previously been possible using default OpenShift monitoring.
Additionally, COO leverages Server-Side Apply to enable fine-grained control of the underlying configuration (for example, of the Prometheus object) without moving full ownership of the resource to the user.
With these two basic concepts, COO enables:
- Scalability: The stack can be configured in a way to fit both the smallest environments (for example, only scrape-and-forward with remote write) and the largest environments (for example, through manual sharding by running multiple stacks on one cluster).
- Multitenancy: COO-managed stacks can fit into any ownership model. For example, additional SRE teams can operate shared services on the cluster for other teams.
- Flexibility: Any number of scrape targets and alerting rules can be added to a COO-managed stack by leveraging the Prometheus Operator CRDs.
Looking forward
As mentioned, deploying and managing monitoring stacks using COO is expected to be only an initial feature set. In future releases, we plan to add capabilities for managing logging and distributed tracing stacks, all with the benefits described above.
As we look to the future, our product roadmap for COO is ambitious and aligns with our goal of continuous innovation. By expanding its capabilities to encompass logging and distributed tracing, we are not just enhancing a product, we're evolving an ecosystem. This holistic approach to observability underlines our commitment to delivering comprehensive, industry-leading solutions that are in tune with the needs of our users and the direction of the market.
Additionally, creating an ObversabilityStack CRD and managing other observability signals under COO will add another abstraction layer that will help simplify the configuration of observability components even further and will enable us to add additional functionality that works across all observability signals.
Introducing the Cluster Observability Operator is a new milestone in the OpenShift ecosystem. It reflects our commitment to innovation, adaptability and customer-centric development. COO enhances our current offerings and sets the stage for future developments in observability. We would highly value your feedback, additional ideas and any community contributions to the upstream project as we evolve and refine this tool.
저자 소개
Daniel Mohr joined Red Hat in 2021 with a background in embedded Linux software development, site reliability engineering for large scale web applications and leading SRE teams. In his role as an engineering manager he works with topics like the Red Hat OpenShift monitoring stack, Multicluster Observability and Power Monitoring for OpenShift as part of the Red Hat Observability group.
Roger Florén, a dynamic and forward-thinking leader, currently serves as the Principal Product Manager at Red Hat, specializing in Observability. His journey in the tech industry is marked by high performance and ambition, transitioning from a senior developer role to a principal product manager. With a strong foundation in technical skills, Roger is constantly driven by curiosity and innovation. At Red Hat, Roger leads the Observability platform team, working closely with in-cluster monitoring teams and contributing to the development of products like Prometheus, AlertManager, Thanos and Observatorium. His expertise extends to coaching, product strategy, interpersonal skills, technical design, IT strategy and agile project management.
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.