We at Red Hat are proud to have the opportunity to work with so many interesting and innovative organizations. One such group is the Mass Open Cloud (MOC), which is a non-profit initiative that includes universities, government organizations and businesses, and provides reliable and cost effective storage to support both its public and private clouds built on Red Hat OpenStack Platform. In addition to OpenStack, the MOC has deployed Red Hat Ceph Storage as the storage foundation for its innovative research and big data analytics. This blog will showcase the importance of Ceph storage in the work Red Hat is doing with the MOC.
Collaboration Between MOC and Ceph
The MOC was formed in large part as a way to offer organizations including business, universities, governments and nonprofits a way to store and extract meaningful insights out of large amounts of data. The goal of the MOC is to provide these groups with a common, cloud-based infrastructure on which researchers can store, share and analyze data. However, most public clouds are typically built in a closed environment and operated by a single provider, meaning limited flexibility, which is not ideal for the organizations that build on the MOC. The MOC found itself needing to create a public cloud that is inexpensive, efficient and highly scalable, so it made sense that they would turn to open source solutions to do so.
The MOC chose the Red Hat OpenStack Platform as the underlying infrastructure foundation because it is cost-effective and can support a large number of contributors. It was quickly realized that a storage solution was needed in addition, and the MOC worked with Red Hat Consulting to deploy Red Hat Ceph Storage. Running three storage clusters - a production environment, a research and experimentation cluster and an internal testing cluster - Ceph allows the MOC to expand its storage needs to meet researchers’ ever growing needs for developing innovative, data-intensive applications while also performing detailed analysis. Ceph Storage also provides rapid recovery from issues and high reliability for critical research project data.
Northeast Storage Exchange
One of the advantages that the use of open source technologies and Ceph storage gives the MOC is the ability to build innovative data solutions without having to rely on new technology platforms. One such project is the Northeast Storage Exchange (NESE), which is a project that is funded out of the recently awarded National Science Foundation (NSF) Grant, that is helping to fund a national cloud testbed for the research and development of new cloud computing platforms. Specifically, NESE allows advanced researchers, including physicists, biochemists and others to generate large amounts of data, and actually have the room to store it. This is very important, because computers and sensors have become so much faster and better in the sense that we are now able to collect larger quantities of data than ever before. Within this data could live the answers to pretty big, potentially humanity-changing questions, like potential cures for cancer. The issue with the enormous amounts of data is storage - where to store it and how to store it in both a cost effective and in an accessible way. Currently, the researchers were finding that the data was either scattered around in such a way that it was difficult to run computations on, or some of it was being thrown away and there was no way to determine what data was being disposed of.
NESE works to solve the issue of data storage for science by offering a giant central data repository accessible to lots of universities. It allows for multiple researchers from multiple universities to both store and access data for the advancement of scientific research, which is critical for scientists of any discipline doing research on data. With NESE, a researcher can gather the data they need, and then layer other applications on top of it, like analysis through artificial intelligence (AI) and machine learning (ML) to glean insights. With NESE running Ceph storage, the data stored is replicated across multiple drives, which also takes care of the issue of backing up the data. NESE is significant because it is one of the first times that open source software has been used on a data store of this scale. Ceph storage gave the researchers the opportunity to store massive amounts of data in a cost-effective way and in a manner that can be easily layered for easier data abstraction, to advance what is often mission-critical scientific research. With the NSF grant, this research will be able to continue and expand.
Datacenter-Data-Delivery Network (D3N)
In addition to NESE, the MOC research team is creating a datacenter-data-delivery network, D3N, which is a novel multi-layer cooperative caching architecture for object stores that is currently in production. It is essentially designed to accelerate big data analytic workloads with strong locality traits and a limited network connectivity between compute clusters and data storage. One of the biggest advantages for an organization is the speed at which they can glean insights from the data they have, in addition to how useful these insights will be. However, the more data you collect, the harder it can be to actually be able to use that data - becoming somewhat of a Catch-22. To help with large-scale data analysis, it is fairly common to use data lakes, which are large repositories of data that store and share terabyte and petabyte data sets. D3N - based on Ceph - improves the performance of big-data jobs running in analytics clusters by increasing the speeds at which the reads and writes take place in the data lake. There are three components to the D3N architecture:
-
Cache servers, which client requests are directed to and which act as proxies for the back-end object store, which stores data locally for re-use
-
Lookup service, so researchers can look up what they need from local servers
-
Heartbeat service, which will track the set of active caches.
The data stored in the data lake can be thought of as a funnel, so the more narrow the access, the harder it will to retrieve the data from. If you cache over a wider network in different cloud data centers - such as what Ceph allows for - the data will be able to be accessed over multiple data centers, so anybody anywhere in the world can have access to it much faster. More on the D3N can be found in the latest issue of Red Hat’s Research Quarterly.
While storage is an immensely important foundation for gaining insights from data, one of the biggest advantages of Ceph is that it is open source. The advantages of running data lakes on open source technology moves beyond the technology itself to establishing a holistic research culture. There has historically been a massive barrier for grad students entering the field, and with the collaboration of universities with the MOC and working with Red Hat and open source tools, researchers can work together faster than when working in closed silos, allowing for greater accessibility and collaboration.
To learn more about the MOC, check out their website. To see more about what NESE is working on and keep up on their latest projects, see here.
저자 소개
Hugh Brock is the Research Director for Red Hat, coordinating Red Hat research and collaboration with universities, governments, and industry worldwide. A Red Hatter since 2002, Hugh brings intimate knowledge of the complex relationship between upstream projects and shippable products to the task of finding research to bring into the open source world.
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.