피드 구독

In a hybrid IT environment, you'll often have a combination of Red Hat OpenShift deployments on public, private, hybrid and multi cloud environments as well as on Red Hat Enterprise Linux (RHEL) systems at the edge. As a site reliability engineer (SRE), it is essential to monitor all of these systems to meet service level agreements (SLAs) and service level objectives (SLOs). This post guides you through setting up Performance Co-Pilot, our monitoring solution for RHEL, and configuring OpenShift Monitoring to scrape metrics from your RHEL systems at the edge.

Creating a RHEL edge image

Open the Red Hat Console and navigate to Edge Management > Manage Images. Click the “Create new image” button and follow the dialog to create a customized image. Make sure to include the pcp package in the list of additional packages to install. Download the .iso image, flash it to a storage medium and boot an edge device from it.

For more information on how to use the Edge Management application, please refer to the Edge Management documentation.

Deciding which metrics to monitor

PCP comes with a wide range of metrics out-of-the-box, and supports installing additional agents to gather metrics from different subsystems and services.

Currently, you need to install an additional SELinux policy. We are working on removing this extra step in a future RHEL release (RHEL 9.3 or later):

$ test -d /var/lib/pcp/selinux && sudo /usr/libexec/pcp/bin/selinux-setup /var/lib/pcp/selinux install pcpupstream

Let’s start and enable the metrics collector, and list all installed metrics:

$ sudo systemctl enable --now pmcd
$ pminfo -t

You can search for additional agents with the following command:

$ dnf search pcp-pmda

Once you have identified one or more additional agents, you install and enable them with the following steps. In this example, we’ll install the SMART (Self-Monitoring, Analysis and Reporting Technology) PMDA (Performance Metric Domain Agent) to monitor the health of the hard drives in our system:

$ sudo rpm-ostree install pcp-pmda-smart
$ sudo systemctl reboot
$ cd /var/lib/pcp/pmdas/smart && sudo ./Install

We can list all new SMART metrics with pminfo -t smart and run pminfo -df smart.nvme_attributes.data_units_written to show the current value of a metric.

Tip: Another interesting PMDA for edge devices is the netcheck PMDA, which performs network checks on the edge device.

Exporting metrics in the OpenMetrics format

The pmproxy daemon (included with PCP) can export metrics in the OpenMetrics format. First, let’s start and enable the daemon:

$ sudo systemctl enable --now pmproxy

pmproxy exports the metrics on http://<hostname>:44322/metrics. By default, all available metrics are exported. This provides us with great insights, but it also consumes more CPU cycles while scraping and requires more storage space. Therefore, it is recommended to limit the set of exported metrics with the names parameter, for example:

$ curl "http://localhost:44322/metrics?names=disk.dev.read_bytes,disk.dev.write_bytes"

Note: Metric values must be floating point numbers or integers. Strings are not supported in the OpenMetrics format and are not exported by pmproxy.

Allow outside access to pmproxy by enabling the pmproxy service in the firewall:

$ sudo firewall-cmd --permanent --add-service pmproxy
$ sudo firewall-cmd --reload

Note: The above command allows access to pmproxy from the default zone. For production environments, it is recommended that access be restricted to an internal network.

Ingesting metrics with OpenShift Monitoring

Once we’ve decided on a list of metrics to ingest and started the pmproxy daemon as described above, we can configure OpenShift Monitoring to ingest metrics from our RHEL systems.

As a prerequisite, monitoring for user-defined projects needs to be enabled in the cluster. Please refer to the OpenShift Monitoring manual for instructions.

In the next step, we create a new project:

$ oc new-project edge-monitoring

To monitor hosts outside the OpenShift cluster, the following manifests need to be created for each monitored host. In this example, the host to monitor is called node1, with the IP address 192.168.31.129 and the metrics disk.dev.read_bytes and disk.dev.write_bytes are scraped every 30 seconds. Save the following manifests to manifests.yaml, adjust the values accordingly and apply them to your cluster by running oc apply -f manifests.yaml:

kind: Service
apiVersion: v1
metadata:
  labels:
    app: node1-pmproxy
  name: node1-pmproxy
  namespace: edge-monitoring
spec:
  type: ClusterIP
  ports:
  - name: metrics
    port: 44322
---
kind: Endpoints
apiVersion: v1
metadata:
  name: node1-pmproxy
  namespace: edge-monitoring
subsets:
- addresses:
  - ip: 192.168.31.129
  ports:
  - name: metrics
    port: 44322
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: node1-pmproxy
  name: node1-pmproxy
  namespace: edge-monitoring
spec:
  endpoints:
  - port: metrics
    interval: 30s
    params:
      names: ["disk.dev.read_bytes,disk.dev.write_bytes"]
  selector:
    matchLabels:
      app: node1-pmproxy

Visualizing metrics with the OpenShift Console

Navigate to your OpenShift Console and visit Observe > Targets. You will see your configured hosts in the list of targets (you can use the "Source: User" filter to list only targets of user-defined projects):

 

Figure 1: List of configured metric targets

Figure 1: List of configured metric targets

Now click the Metrics button in the navigation bar. Type rate(disk_dev_write_bytes[5m]) * 1024 and press the “Run queries” button to see new metric values.

 

Figure 2: Visualizing metrics in the OpenShift Console

Figure 2: Visualizing metrics in the OpenShift Console

Note: The disk.dev.write_bytes PCP metric is stored in kilobytes (visible with pminfo -d disk.dev.write_bytes), therefore we need to multiply by 1024 to get the metric values in bytes. Additionally, metrics in PCP use a dot as a separator, whereas OpenMetrics metrics use an underscore as a separator.

Conclusion

In this article, we learned how to use OpenShift Monitoring to gather metrics from RHEL systems on the edge. If you want to learn more about Performance Co-Pilot, please refer to Automating Performance Analysis and the Performance Optimization Series. Refer to the hybrid cloud blog for more articles about OpenShift and hybrid cloud.

 


저자 소개

Andreas Gerstmayr is an engineer in Red Hat's Platform Tools group, working on Performance Co-Pilot, Grafana plugins and related performance tools, integrations and visualizations.

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

채널별 검색

automation icon

오토메이션

기술, 팀, 인프라를 위한 IT 자동화 최신 동향

AI icon

인공지능

고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트

open hybrid cloud icon

오픈 하이브리드 클라우드

하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요

security icon

보안

환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보

edge icon

엣지 컴퓨팅

엣지에서의 운영을 단순화하는 플랫폼 업데이트

Infrastructure icon

인프라

세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보

application development icon

애플리케이션

복잡한 애플리케이션에 대한 솔루션 더 보기

Original series icon

오리지널 쇼

엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리