What are Granite models?

Copy URL

Granite is a series of large language models (LLMs) created by IBM for enterprise applications. Granite foundation models can support generative artificial intelligence (gen AI) use cases that involve language and code.

Granite family models are open source assured under the Apache 2.0 license, which means developers can experiment with, modify, and distribute Granite models for free. This makes Granite models a good choice for organizations that deal with sensitive data and want to run their own LLM rather than relying on an outside service.

See how Granite works with Red Hat

Foundation models are trained to function with a general understanding of patterns, structures, and representations of language. This “foundational” training teaches the model how to communicate and identify those patterns. This is called AI inference. AI inference is the operational phase of AI, where the model is able to apply learning from training and apply it to real-world situations."

Why you should care about AI inference 

The IBM Granite AI models have this baseline of knowledge that can be further fine-tuned to perform specific tasks for almost any industry. Granite family models are trained on curated data and provide transparency into the data that’s used for training.

LLMs use gen AI to produce new content based on the prompt a user enters. Today, people often use gen AI to generate text, pictures, video, and code. Businesses can use LLM foundation models to automate various aspects of operations, such as customer-support chatbots or testing software code.

Other LLM foundation models that use gen AI include Meta’s LLaMa (which includes LLaMa 2 and LLaMa 3), Google’s Gemini, Anthropic’s Claude, OpenAI’s GPT (known for their ChatGPT bot), and Mistral. However, what sets the Granite AI models apart is the disclosure of their training data, building trust with their users and making them more suitable for enterprise environments.

Red Hat resources

Yes, some of the Granite AI model series are available under an open source license, which means developers can easily access the model and build on it locally. Then they can fine-tune the model for their particular goals. Users even have access to a majority of the data used to train the model (PDF) so they can understand how it was built and how it functions.

When it comes to Granite models, open source means a space where developers can customize the model with their own data to generate user-specific outputs. It doesn’t mean everyone’s private data is available to the whole open source community. Unlike public web service AI, Granite models don’t continuously train. So any data input on the Granite family model will never be shared with Red Hat, IBM, or any other Granite users.

Enterprises in many industries―from healthcare to construction―can use Granite in a variety of ways to help automate their operations on a large scale. Granite models can be trained in business-domain tasks like summarization, question answering, and classification. Here are a few examples:

  • Code generation: Granite code models can help build upon or improve work done by developers to make processes more efficient. For example, developers can take advantage of autocomplete: Similarly to autocomplete on our smartphones, the model can finish a code sentence before the developer finishes typing.
  • Insight extraction: When you need to simplify, summarize, or explain large data sets, Granite can identify accurate patterns and insights quickly. This saves you the hassle of combing through a lot of data. 
  • Flexible architecture: Granite can integrate with existing systems and can be deployed on premise or in the cloud. Its interfaces are made to simplify deployment. The Granite family includes models of various sizes, so you can choose one that best matches your needs while managing your computing costs.
  • Custom solutions: Though Granite is sold as a foundation model, it’s built to be trained for business-specific knowledge. Users have the flexibility to scale and fine-tune the model to tailor it to their business needs. For example, if your business is focused on medical devices, you can teach the model lingo used in the healthcare industry.
  • Low latency: Running a Granite model on your own infrastructure means you can optimize for quick response times. The model can deliver real-time data, making it handy for critical operations. If we stick with the healthcare example, accessibility to real-time data is important for remote doctor-patient collaboration and time-sensitive care. Compressing the Granite model can provide powerful performance with even less resources.
  • High accuracy: Developers can fine-tune the Granite series for industry-specific tasks to make the model an expert in any subject. It can also be trained in multiple languages to maintain accuracy and accessibility on a global scale.
  • Transparent models: Because Granite is available under an open source license, developers can see how the AI model was built and trained, as well as collaborate with an open source community.

Yes, Granite models do support distributed inference capapbilities. 

Distributed inference lets AI models process workloads more efficiently by dividing the labor of inference across a group of interconnected devices. Distributed inference supports a system that splits requests across a fleet of hardware, which can include physical and cloud servers. 

From there, each inference server processes its assigned portion in parallel to create an output. The result is a resilient and observable system for delivering consistent and scalable AI-powered services. Frameworks like llm-d support distributed inference at scale, to speed up gen AI applications across the enterprise.

Find out what else distributed inference can help you with 

Simply put, there is no AI without inference

But inference gets a lot of pressure from models that keep growing bigger. As models get more complex, inference becomes slower.

For inference to be successful, AI models need to do a lot of math in a short period of time. So, factors like model size, high user volume, and latency can all limit performance. When models require more data and more memory, hardware and accelerators struggle to keep up.

That's why the hardware and software that support your inference capabilities can make or break your AI strategy---and why Granite is build for next-level inference capabilities.

Why you should care about AI inference 

IBM has released multiple Granite model series to fulfill the needs of enterprise applications that are becoming more complex. There are different categories and naming conventions of the model series within the Granite family.

Each series serves a different purpose:

  • Granite for Language: These models deliver accurate natural language processing (NLP) in multiple languages while maintaining low latency. 

    Explore generative AI use cases

  • Granite for Code: These models are trained on more than 100 different programming languages to support enterprise-level software tasks.
  • Granite for Time Series: These models are fine-tuned for time series forecasting, a method of predicting future data using data from the past.
  • Granite for GeoSpatial: IBM and NASA created this foundation model that can observe Earth with large-scale satellite data collection to help track and address environmental changes.

    Explore predictive AI use cases

Within each of these series, Granite offers models of different sizes and specialties. For example, Granite for Language includes:

  • Granite-7b-base, a general-purpose language model for conversations and chat purposes.
  • Granite-7b-instruct, which specializes in following task instructions.

Explore Granite models on Hugging Face

Red Hat® AI is a platform of products and services that can help your enterprise at any stage of the AI journey - whether you’re at the very beginning or ready to scale. It can support both generative and predictive AI efforts for your unique enterprise use cases.

With Red Hat AI, you have access to Red Hat® AI Inference Server to optimize model inference across the hybrid cloud for faster, cost-effective deployments. Powered by vLLM, the inference server maximizes GPU utilization and enables faster response times.

Learn more about Red Hat AI Inference Server

Red Hat AI Inference Server includes the Red Hat AI repository, a collection of third-party validated and optimized models that allows model flexibility and encourages cross-team consistency. With access to the third-party model repository, enterprises can accelerate time to market and decrease financial barriers to AI success. 

Learn more about validated models by Red Hat AI

 

The official Red Hat blog

Get the latest information about our ecosystem of customers, partners, and communities.

All Red Hat product trials

Our no-cost product trials help you gain hands-on experience, prepare for a certification, or assess if a product is right for your organization.

Keep reading

What is generative AI?

Generative AI is a kind of artificial intelligence technology that relies on deep learning models trained on large data sets to create new content.

AIOps explained

AIOps (AI for IT operations) is an approach to automating IT operations with machine learning and other advanced AI techniques.

AI infrastructure explained

AI infrastructure combines artificial intelligence and machine learning (AI/ML) technology to develop and deploy reliable and scalable data solutions.

Artificial intelligence resources