What are Granite models?

Updated May 12, 2026•5-minute read

Granite is a series of large language models (LLMs) created by IBM for enterprise applications. Granite foundation models can support generative artificial intelligence (gen AI) use cases that involve language and code.

Granite family models are open source assured under the Apache 2.0 license, which means developers can experiment with, modify, and distribute Granite models for free. This makes Granite models a good choice for organizations that deal with sensitive data and want to run their own LLM rather than relying on an outside service.

See how Granite works with Red Hat

Foundation models are trained to function with a general understanding of patterns, structures, and representations of language. This “foundational” training teaches the model how to communicate and identify those patterns. This is called AI inference. AI inference is the operational phase of AI, where the model is able to apply learning from training and apply it to real-world situations."

Why you should care about AI inference

The IBM Granite AI models have this baseline of knowledge that can be further fine-tuned to perform specific tasks for almost any industry. Granite family models are trained on curated data and provide transparency into the data that’s used for training.

LLMs use gen AI to produce new content based on the prompt a user enters. Today, people often use gen AI to generate text, pictures, video, and code. Businesses can use LLM foundation models to automate various aspects of operations, such as customer-support chatbots or testing software code.

Other LLM foundation models that use gen AI include Meta’s LLaMa (which includes LLaMa 2 and LLaMa 3), Google’s Gemini, Anthropic’s Claude, OpenAI’s GPT (known for their ChatGPT bot), and Mistral. However, what sets the Granite AI models apart is the disclosure of their training data, building trust with their users and making them more suitable for enterprise environments.

Yes, some of the Granite AI model series are available under an open source license, which means developers can easily access the model and build on it locally. Then they can fine-tune the model for their particular goals. Users even have access to a majority of the data used to train the model (PDF) so they can understand how it was built and how it functions.

When it comes to Granite models, open source means a space where developers can customize the model with their own data to generate user-specific outputs. It doesn’t mean everyone’s private data is available to the whole open source community. Unlike public web service AI, Granite models don’t continuously train. So any data input on the Granite family model will never be shared with Red Hat, IBM, or any other Granite users.

Enterprises in many industries―from healthcare to construction―can use Granite in a variety of ways to help automate their operations on a large scale. Granite models can be trained in business-domain tasks like summarization, question answering, and classification. Here are a few examples:

Code generation: Granite code models can help build upon or improve work done by developers to make processes more efficient. For example, developers can take advantage of autocomplete: Similarly to autocomplete on our smartphones, the model can finish a code sentence before the developer finishes typing.
Insight extraction: When you need to simplify, summarize, or explain large data sets, Granite can identify accurate patterns and insights quickly. This saves you the hassle of combing through a lot of data.

Flexible architecture: Granite can integrate with existing systems and can be deployed on premise or in the cloud. Its interfaces are made to simplify deployment. The Granite family includes models of various sizes, so you can choose one that best matches your needs while managing your computing costs.
Custom solutions: Though Granite is sold as a foundation model, it’s built to be trained for business-specific knowledge. Users have the flexibility to scale and fine-tune the model to tailor it to their business needs. For example, if your business is focused on medical devices, you can teach the model lingo used in the healthcare industry.
Low latency: Running a Granite model on your own infrastructure means you can optimize for quick response times. The model can deliver real-time data, making it handy for critical operations. If we stick with the healthcare example, accessibility to real-time data is important for remote doctor-patient collaboration and time-sensitive care. Compressing the Granite model can provide powerful performance with even less resources.
High accuracy: Developers can fine-tune the Granite series for industry-specific tasks to make the model an expert in any subject. It can also be trained in multiple languages to maintain accuracy and accessibility on a global scale.
Transparent models: Because Granite is available under an open source license, developers can see how the AI model was built and trained, as well as collaborate with an open source community.

Yes, Granite models do support distributed inference capapbilities.

Distributed inference lets AI models process workloads more efficiently by dividing the labor of inference across a group of interconnected devices. Distributed inference supports a system that splits requests across a fleet of hardware, which can include physical and cloud servers.

From there, each inference server processes its assigned portion in parallel to create an output. The result is a resilient and observable system for delivering consistent and scalable AI-powered services. Frameworks like llm-d support distributed inference at scale, to speed up gen AI applications across the enterprise.

Find out what else distributed inference can help you with

Simply put, there is no AI without inference.

But inference gets a lot of pressure from models that keep growing bigger. As models get more complex, inference becomes slower.

For inference to be successful, AI models need to do a lot of math in a short period of time. So, factors like model size, high user volume, and latency can all limit performance. When models require more data and more memory, hardware and accelerators struggle to keep up.

That's why the hardware and software that support your inference capabilities can make or break your AI strategy---and why Granite is build for next-level inference capabilities.

Why you should care about AI inference

IBM has released multiple Granite model series to fulfill the needs of enterprise applications that are becoming more complex. There are different categories and naming conventions of the model series within the Granite family.

Each series serves a different purpose:

Granite for Language: These models deliver accurate natural language processing (NLP) in multiple languages while maintaining low latency.
Explore generative AI use cases
Granite for Code: These models are trained on more than 100 different programming languages to support enterprise-level software tasks.
Granite for Time Series: These models are fine-tuned for time series forecasting, a method of predicting future data using data from the past.
Granite for GeoSpatial: IBM and NASA created this foundation model that can observe Earth with large-scale satellite data collection to help track and address environmental changes.
Explore predictive AI use cases

Within each of these series, Granite offers models of different sizes and specialties. For example, Granite for Language includes:

Granite-7b-base, a general-purpose language model for conversations and chat purposes.
Granite-7b-instruct, which specializes in following task instructions.

Explore Granite models on Hugging Face

Red Hat® AI is built for fast, flexible, and efficient inference through its vLLM-powered server. It reliably connects models to your data to unify the customization and development of specialized agents on a single platform. Built on an open source foundation, our products give you full control of AI workflows from end-to-end at any scale.

The Red Hat AI portfolio includes Red Hat AI Inference, an inference stack that provides the operational control to run any model on any accelerator across the hybrid cloud. Get fast, efficient, and cost-effective inference at scale.

Explore Red Hat AI

Keep reading

What is agentic AI?

Agentic AI is a software system designed to interact with data and tools in a way that requires minimal human intervention.

What is generative AI?

Generative AI is a kind of artificial intelligence technology that relies on deep learning models trained on large data sets to create new content.

What are large language models?

A large language model (LLM) is a type of artificial intelligence that uses machine learning techniques to understand and generate human language.

What are Granite models?

Red Hat resources

The official Red Hat blog

All Red Hat product trials

Keep reading

What is agentic AI?

What is generative AI?

What are large language models?

Artificial intelligence resources

Platforms

Tools

Try, buy, & sell

Communicate

About Red Hat

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links