In a hybrid IT environment, you'll often have a combination of Red Hat OpenShift deployments on public, private, hybrid and multi cloud environments as well as on Red Hat Enterprise Linux (RHEL) systems at the edge. As a site reliability engineer (SRE), it is essential to monitor all of these systems to meet service level agreements (SLAs) and service level objectives (SLOs). This post guides you through setting up Performance Co-Pilot, our monitoring solution for RHEL, and configuring OpenShift Monitoring to scrape metrics from your RHEL systems at the edge.
Creating a RHEL edge image
Open the Red Hat Console and navigate to Edge Management > Manage Images. Click the “Create new image” button and follow the dialog to create a customized image. Make sure to include the pcp
package in the list of additional packages to install. Download the .iso image, flash it to a storage medium and boot an edge device from it.
For more information on how to use the Edge Management application, please refer to the Edge Management documentation.
Deciding which metrics to monitor
PCP comes with a wide range of metrics out-of-the-box, and supports installing additional agents to gather metrics from different subsystems and services.
Currently, you need to install an additional SELinux policy. We are working on removing this extra step in a future RHEL release (RHEL 9.3 or later):
$ test -d /var/lib/pcp/selinux && sudo /usr/libexec/pcp/bin/selinux-setup /var/lib/pcp/selinux install pcpupstream
Let’s start and enable the metrics collector, and list all installed metrics:
$ sudo systemctl enable --now pmcd $ pminfo -t
You can search for additional agents with the following command:
$ dnf search pcp-pmda
Once you have identified one or more additional agents, you install and enable them with the following steps. In this example, we’ll install the SMART (Self-Monitoring, Analysis and Reporting Technology) PMDA (Performance Metric Domain Agent) to monitor the health of the hard drives in our system:
$ sudo rpm-ostree install pcp-pmda-smart $ sudo systemctl reboot $ cd /var/lib/pcp/pmdas/smart && sudo ./Install
We can list all new SMART metrics with pminfo -t smart
and run pminfo -df smart.nvme_attributes.data_units_written
to show the current value of a metric.
Tip: Another interesting PMDA for edge devices is the netcheck PMDA, which performs network checks on the edge device.
Exporting metrics in the OpenMetrics format
The pmproxy daemon (included with PCP) can export metrics in the OpenMetrics format. First, let’s start and enable the daemon:
$ sudo systemctl enable --now pmproxy
pmproxy exports the metrics on http://<hostname>:44322/metrics. By default, all available metrics are exported. This provides us with great insights, but it also consumes more CPU cycles while scraping and requires more storage space. Therefore, it is recommended to limit the set of exported metrics with the names
parameter, for example:
$ curl "http://localhost:44322/metrics?names=disk.dev.read_bytes,disk.dev.write_bytes"
Note: Metric values must be floating point numbers or integers. Strings are not supported in the OpenMetrics format and are not exported by pmproxy.
Allow outside access to pmproxy by enabling the pmproxy service in the firewall:
$ sudo firewall-cmd --permanent --add-service pmproxy $ sudo firewall-cmd --reload
Note: The above command allows access to pmproxy from the default zone. For production environments, it is recommended that access be restricted to an internal network.
Ingesting metrics with OpenShift Monitoring
Once we’ve decided on a list of metrics to ingest and started the pmproxy daemon as described above, we can configure OpenShift Monitoring to ingest metrics from our RHEL systems.
As a prerequisite, monitoring for user-defined projects needs to be enabled in the cluster. Please refer to the OpenShift Monitoring manual for instructions.
In the next step, we create a new project:
$ oc new-project edge-monitoring
To monitor hosts outside the OpenShift cluster, the following manifests need to be created for each monitored host. In this example, the host to monitor is called node1
, with the IP address 192.168.31.129
and the metrics disk.dev.read_bytes
and disk.dev.write_bytes
are scraped every 30 seconds. Save the following manifests to manifests.yaml
, adjust the values accordingly and apply them to your cluster by running oc apply -f manifests.yaml
:
kind: Service apiVersion: v1 metadata: labels: app: node1-pmproxy name: node1-pmproxy namespace: edge-monitoring spec: type: ClusterIP ports: - name: metrics port: 44322 --- kind: Endpoints apiVersion: v1 metadata: name: node1-pmproxy namespace: edge-monitoring subsets: - addresses: - ip: 192.168.31.129 ports: - name: metrics port: 44322 --- apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: k8s-app: node1-pmproxy name: node1-pmproxy namespace: edge-monitoring spec: endpoints: - port: metrics interval: 30s params: names: ["disk.dev.read_bytes,disk.dev.write_bytes"] selector: matchLabels: app: node1-pmproxy
Visualizing metrics with the OpenShift Console
Navigate to your OpenShift Console and visit Observe > Targets. You will see your configured hosts in the list of targets (you can use the "Source: User" filter to list only targets of user-defined projects):
Figure 1: List of configured metric targets
Now click the Metrics button in the navigation bar. Type rate(disk_dev_write_bytes[5m]) * 1024
and press the “Run queries” button to see new metric values.
Figure 2: Visualizing metrics in the OpenShift Console
Note: The disk.dev.write_bytes
PCP metric is stored in kilobytes (visible with pminfo -d disk.dev.write_bytes
), therefore we need to multiply by 1024 to get the metric values in bytes. Additionally, metrics in PCP use a dot as a separator, whereas OpenMetrics metrics use an underscore as a separator.
Conclusion
In this article, we learned how to use OpenShift Monitoring to gather metrics from RHEL systems on the edge. If you want to learn more about Performance Co-Pilot, please refer to Automating Performance Analysis and the Performance Optimization Series. Refer to the hybrid cloud blog for more articles about OpenShift and hybrid cloud.
저자 소개
Andreas Gerstmayr is an engineer in Red Hat's Platform Tools group, working on Performance Co-Pilot, Grafana plugins and related performance tools, integrations and visualizations.
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.