We at Red Hat are proud to have the opportunity to work with so many interesting and innovative organizations. One such group is the Mass Open Cloud (MOC), which is a non-profit initiative that includes universities, government organizations and businesses, and provides reliable and cost effective storage to support both its public and private clouds built on Red Hat OpenStack Platform. In addition to OpenStack, the MOC has deployed Red Hat Ceph Storage as the storage foundation for its innovative research and big data analytics. This blog will showcase the importance of Ceph storage in the work Red Hat is doing with the MOC.
Collaboration Between MOC and Ceph
The MOC was formed in large part as a way to offer organizations including business, universities, governments and nonprofits a way to store and extract meaningful insights out of large amounts of data. The goal of the MOC is to provide these groups with a common, cloud-based infrastructure on which researchers can store, share and analyze data. However, most public clouds are typically built in a closed environment and operated by a single provider, meaning limited flexibility, which is not ideal for the organizations that build on the MOC. The MOC found itself needing to create a public cloud that is inexpensive, efficient and highly scalable, so it made sense that they would turn to open source solutions to do so.
The MOC chose the Red Hat OpenStack Platform as the underlying infrastructure foundation because it is cost-effective and can support a large number of contributors. It was quickly realized that a storage solution was needed in addition, and the MOC worked with Red Hat Consulting to deploy Red Hat Ceph Storage. Running three storage clusters - a production environment, a research and experimentation cluster and an internal testing cluster - Ceph allows the MOC to expand its storage needs to meet researchers’ ever growing needs for developing innovative, data-intensive applications while also performing detailed analysis. Ceph Storage also provides rapid recovery from issues and high reliability for critical research project data.
Northeast Storage Exchange
One of the advantages that the use of open source technologies and Ceph storage gives the MOC is the ability to build innovative data solutions without having to rely on new technology platforms. One such project is the Northeast Storage Exchange (NESE), which is a project that is funded out of the recently awarded National Science Foundation (NSF) Grant, that is helping to fund a national cloud testbed for the research and development of new cloud computing platforms. Specifically, NESE allows advanced researchers, including physicists, biochemists and others to generate large amounts of data, and actually have the room to store it. This is very important, because computers and sensors have become so much faster and better in the sense that we are now able to collect larger quantities of data than ever before. Within this data could live the answers to pretty big, potentially humanity-changing questions, like potential cures for cancer. The issue with the enormous amounts of data is storage - where to store it and how to store it in both a cost effective and in an accessible way. Currently, the researchers were finding that the data was either scattered around in such a way that it was difficult to run computations on, or some of it was being thrown away and there was no way to determine what data was being disposed of.
NESE works to solve the issue of data storage for science by offering a giant central data repository accessible to lots of universities. It allows for multiple researchers from multiple universities to both store and access data for the advancement of scientific research, which is critical for scientists of any discipline doing research on data. With NESE, a researcher can gather the data they need, and then layer other applications on top of it, like analysis through artificial intelligence (AI) and machine learning (ML) to glean insights. With NESE running Ceph storage, the data stored is replicated across multiple drives, which also takes care of the issue of backing up the data. NESE is significant because it is one of the first times that open source software has been used on a data store of this scale. Ceph storage gave the researchers the opportunity to store massive amounts of data in a cost-effective way and in a manner that can be easily layered for easier data abstraction, to advance what is often mission-critical scientific research. With the NSF grant, this research will be able to continue and expand.
Datacenter-Data-Delivery Network (D3N)
In addition to NESE, the MOC research team is creating a datacenter-data-delivery network, D3N, which is a novel multi-layer cooperative caching architecture for object stores that is currently in production. It is essentially designed to accelerate big data analytic workloads with strong locality traits and a limited network connectivity between compute clusters and data storage. One of the biggest advantages for an organization is the speed at which they can glean insights from the data they have, in addition to how useful these insights will be. However, the more data you collect, the harder it can be to actually be able to use that data - becoming somewhat of a Catch-22. To help with large-scale data analysis, it is fairly common to use data lakes, which are large repositories of data that store and share terabyte and petabyte data sets. D3N - based on Ceph - improves the performance of big-data jobs running in analytics clusters by increasing the speeds at which the reads and writes take place in the data lake. There are three components to the D3N architecture:
-
Cache servers, which client requests are directed to and which act as proxies for the back-end object store, which stores data locally for re-use
-
Lookup service, so researchers can look up what they need from local servers
-
Heartbeat service, which will track the set of active caches.
The data stored in the data lake can be thought of as a funnel, so the more narrow the access, the harder it will to retrieve the data from. If you cache over a wider network in different cloud data centers - such as what Ceph allows for - the data will be able to be accessed over multiple data centers, so anybody anywhere in the world can have access to it much faster. More on the D3N can be found in the latest issue of Red Hat’s Research Quarterly.
While storage is an immensely important foundation for gaining insights from data, one of the biggest advantages of Ceph is that it is open source. The advantages of running data lakes on open source technology moves beyond the technology itself to establishing a holistic research culture. There has historically been a massive barrier for grad students entering the field, and with the collaboration of universities with the MOC and working with Red Hat and open source tools, researchers can work together faster than when working in closed silos, allowing for greater accessibility and collaboration.
To learn more about the MOC, check out their website. To see more about what NESE is working on and keep up on their latest projects, see here.
À propos des auteurs
Hugh Brock is the Research Director for Red Hat, coordinating Red Hat research and collaboration with universities, governments, and industry worldwide. A Red Hatter since 2002, Hugh brings intimate knowledge of the complex relationship between upstream projects and shippable products to the task of finding research to bring into the open source world.
Parcourir par canal
Automatisation
Les dernières nouveautés en matière d'automatisation informatique pour les technologies, les équipes et les environnements
Intelligence artificielle
Actualité sur les plateformes qui permettent aux clients d'exécuter des charges de travail d'IA sur tout type d'environnement
Cloud hybride ouvert
Découvrez comment créer un avenir flexible grâce au cloud hybride
Sécurité
Les dernières actualités sur la façon dont nous réduisons les risques dans tous les environnements et technologies
Edge computing
Actualité sur les plateformes qui simplifient les opérations en périphérie
Infrastructure
Les dernières nouveautés sur la plateforme Linux d'entreprise leader au monde
Applications
À l’intérieur de nos solutions aux défis d’application les plus difficiles
Programmes originaux
Histoires passionnantes de créateurs et de leaders de technologies d'entreprise
Produits
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Services cloud
- Voir tous les produits
Outils
- Formation et certification
- Mon compte
- Assistance client
- Ressources développeurs
- Rechercher un partenaire
- Red Hat Ecosystem Catalog
- Calculateur de valeur Red Hat
- Documentation
Essayer, acheter et vendre
Communication
- Contacter le service commercial
- Contactez notre service clientèle
- Contacter le service de formation
- Réseaux sociaux
À propos de Red Hat
Premier éditeur mondial de solutions Open Source pour les entreprises, nous fournissons des technologies Linux, cloud, de conteneurs et Kubernetes. Nous proposons des solutions stables qui aident les entreprises à jongler avec les divers environnements et plateformes, du cœur du datacenter à la périphérie du réseau.
Sélectionner une langue
Red Hat legal and privacy links
- À propos de Red Hat
- Carrières
- Événements
- Bureaux
- Contacter Red Hat
- Lire le blog Red Hat
- Diversité, équité et inclusion
- Cool Stuff Store
- Red Hat Summit