What are Granite models?
Granite is a series of large language models (LLMs) created by IBM for enterprise applications. Granite foundation models can support generative artificial intelligence (gen AI) use cases that involve language and code.
Granite family models are open source assured under the Apache 2.0 license, which means developers can experiment with, modify, and distribute Granite models for free. This makes Granite models a good choice for organizations that deal with sensitive data and want to run their own LLM rather than relying on an outside service.
How do Granite AI models work?
Foundation models are trained to function with a general understanding of patterns, structures, and representations of language. This “foundational” training teaches the model how to communicate and identify those patterns. This is called AI inference. AI inference is the operational phase of AI, where the model is able to apply learning from training and apply it to real-world situations."
The IBM Granite AI models have this baseline of knowledge that can be further fine-tuned to perform specific tasks for almost any industry. Granite family models are trained on curated data and provide transparency into the data that’s used for training.
LLMs use gen AI to produce new content based on the prompt a user enters. Today, people often use gen AI to generate text, pictures, video, and code. Businesses can use LLM foundation models to automate various aspects of operations, such as customer-support chatbots or testing software code.
Other LLM foundation models that use gen AI include Meta’s LLaMa (which includes LLaMa 2 and LLaMa 3), Google’s Gemini, Anthropic’s Claude, OpenAI’s GPT (known for their ChatGPT bot), and Mistral. However, what sets the Granite AI models apart is the disclosure of their training data, building trust with their users and making them more suitable for enterprise environments.
Red Hat resources
Are Granite models open source?
Yes, some of the Granite AI model series are available under an open source license, which means developers can easily access the model and build on it locally. Then they can fine-tune the model for their particular goals. Users even have access to a majority of the data used to train the model (PDF) so they can understand how it was built and how it functions.
When it comes to Granite models, open source means a space where developers can customize the model with their own data to generate user-specific outputs. It doesn’t mean everyone’s private data is available to the whole open source community. Unlike public web service AI, Granite models don’t continuously train. So any data input on the Granite family model will never be shared with Red Hat, IBM, or any other Granite users.
How can you use Granite models?
Enterprises in many industries―from healthcare to construction―can use Granite in a variety of ways to help automate their operations on a large scale. Granite models can be trained in business-domain tasks like summarization, question answering, and classification. Here are a few examples:
- Code generation: Granite code models can help build upon or improve work done by developers to make processes more efficient. For example, developers can take advantage of autocomplete: Similarly to autocomplete on our smartphones, the model can finish a code sentence before the developer finishes typing.
- Insight extraction: When you need to simplify, summarize, or explain large data sets, Granite can identify accurate patterns and insights quickly. This saves you the hassle of combing through a lot of data.
What are the benefits of Granite?
- Flexible architecture: Granite can integrate with existing systems and can be deployed on premise or in the cloud. Its interfaces are made to simplify deployment. The Granite family includes models of various sizes, so you can choose one that best matches your needs while managing your computing costs.
- Custom solutions: Though Granite is sold as a foundation model, it’s built to be trained for business-specific knowledge. Users have the flexibility to scale and fine-tune the model to tailor it to their business needs. For example, if your business is focused on medical devices, you can teach the model lingo used in the healthcare industry.
- Low latency: Running a Granite model on your own infrastructure means you can optimize for quick response times. The model can deliver real-time data, making it handy for critical operations. If we stick with the healthcare example, accessibility to real-time data is important for remote doctor-patient collaboration and time-sensitive care. Compressing the Granite model can provide powerful performance with even less resources.
- High accuracy: Developers can fine-tune the Granite series for industry-specific tasks to make the model an expert in any subject. It can also be trained in multiple languages to maintain accuracy and accessibility on a global scale.
- Transparent models: Because Granite is available under an open source license, developers can see how the AI model was built and trained, as well as collaborate with an open source community.
Do Granite models support distributed inference?
Yes, Granite models do support distributed inference capapbilities.
Distributed inference lets AI models process workloads more efficiently by dividing the labor of inference across a group of interconnected devices. Distributed inference supports a system that splits requests across a fleet of hardware, which can include physical and cloud servers.
From there, each inference server processes its assigned portion in parallel to create an output. The result is a resilient and observable system for delivering consistent and scalable AI-powered services. Frameworks like llm-d support distributed inference at scale, to speed up gen AI applications across the enterprise.
Why you should care about inference
Simply put, there is no AI without inference.
But inference gets a lot of pressure from models that keep growing bigger. As models get more complex, inference becomes slower.
For inference to be successful, AI models need to do a lot of math in a short period of time. So, factors like model size, high user volume, and latency can all limit performance. When models require more data and more memory, hardware and accelerators struggle to keep up.
That's why the hardware and software that support your inference capabilities can make or break your AI strategy---and why Granite is build for next-level inference capabilities.
Types of IBM Granite models
IBM has released multiple Granite model series to fulfill the needs of enterprise applications that are becoming more complex. There are different categories and naming conventions of the model series within the Granite family.
Each series serves a different purpose:
Granite for Language: These models deliver accurate natural language processing (NLP) in multiple languages while maintaining low latency.
- Granite for Code: These models are trained on more than 100 different programming languages to support enterprise-level software tasks.
- Granite for Time Series: These models are fine-tuned for time series forecasting, a method of predicting future data using data from the past.
Granite for GeoSpatial: IBM and NASA created this foundation model that can observe Earth with large-scale satellite data collection to help track and address environmental changes.
Within each of these series, Granite offers models of different sizes and specialties. For example, Granite for Language includes:
- Granite-7b-base, a general-purpose language model for conversations and chat purposes.
- Granite-7b-instruct, which specializes in following task instructions.
How can Red Hat help?
Red Hat® AI is a platform of products and services that can help your enterprise at any stage of the AI journey - whether you’re at the very beginning or ready to scale. It can support both generative and predictive AI efforts for your unique enterprise use cases.
With Red Hat AI, you have access to Red Hat® AI Inference Server to optimize model inference across the hybrid cloud for faster, cost-effective deployments. Powered by vLLM, the inference server maximizes GPU utilization and enables faster response times.
Red Hat AI Inference Server includes the Red Hat AI repository, a collection of third-party validated and optimized models that allows model flexibility and encourages cross-team consistency. With access to the third-party model repository, enterprises can accelerate time to market and decrease financial barriers to AI success.
The official Red Hat blog
Get the latest information about our ecosystem of customers, partners, and communities.