Red Hat is continually innovating and part of that innovation includes researching and striving to solve the problems our customers face. That innovation is driven in part through the Office of the CTO and includes Red Hat OpenShift, Red Hat OpenShift Container Storage and use cases such as the Open Hybrid Cloud, Artificial Intelligence and Machine Learning. We recently interviewed Michael Clifford, Data Scientist in the office of the CTO, here at Red Hat about these very topics.
Your title is Data Scientist, right?
That's correct.
What's that mean in terms of working with OpenShift 4, and with the hybrid cloud?
Working in this domain is really twofold.
If we want to provide infrastructure for other companies that want to do machine learning workloads, we're working as the beta testers.
Then, on the other side of it is a question: how do we actually implement some kind of intelligence into the applications that are running on the OpenShift Container Platform?
For example, one of the main, cool features of OpenShift 4 is the automatic updates that happen. But, how do you actually know when an update is happening automatically on hundreds of thousands of servers at a time? You need some kind of intelligent automation to manage that process.
That's one of the things we worked on early, both testing out how other users would use our infrastructure — from the data scientist perspective — as well as implementing the intelligent applications that run behind some of that infrastructure.
So your role is to analyze what's happening, then come up with ways to make it less disruptive. Is that accurate?
Exactly. During an update, we say, “Oh, something strange happening during this update, let's roll back before anything breaks.”
What are some of the data science tools you use to detect that?
Basically, you're ingesting all the data from all the updates that have occurred in the past. Then the machine learning model essentially learns what it looks like when a thing is updating normally. As a new update happens we continually compare it to our model, and if something starts to really deviate in any significant way, we say, “okay, let's flag this, roll it back.”
And you're talking about hundreds of thousands of updates to monitor.
One of the things about working in the AIOps area is that even though there's a lot of data, it's sometimes not very clean data. With a lot of data science projects, people have this idea that you get a file that's very cleanly defined, and you can do your exploratory analysis and all kinds of other stuff on it. With these live, machine-generated, real-time data sets, things can be all over the place.
So the bigger challenge with this particular project is less the machine learning algorithm that's put into place, than the infrastructure required to parse the data — to get enough data that's meaningful, and convert it to a format that is actually usable and ingestible by machine learning tools.
What's generating this hard-to-manage data?
The data wasn't generated with machine learning in mind. There's a lot of post-processing and pre-processing that has to happen between capturing all this massive amount of data, turning it into a format that can actually be used for intelligence.
With that kind of data, is it harder to decide what is useful data for machine learning, and what is just something that has to be managed?
A lot of times, especially with this type of stuff, you will have to go back and talk to a subject matter expert. Like somebody who's actually working on the OpenShift 4 updates, and you can say, “This variable seems like it would be very informative. Is this something that we should use?” And they'll say, “No, this is generated by something that you're trying to predict anyways, so it'll be a circular prediction.”
I think that's just a big part of the practice of data science — a lot of looking at the data, but also talking with subject matter experts to determine the right thing to do.
Thanks Michael.
Thank you.
저자 소개
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.