Jump to section

RAG vs. fine-tuning

Copy URL

Both RAG and fine-tuning aim to improve large language models (LLMs). RAG does this without modifying the underlying LLM, while fine-tuning requires adjusting the weights and parameters of an LLM. Often, you can customize a model by using both fine-tuning and RAG architecture.

Explore Red Hat AI

An LLM is a type of artificial intelligence (AI) that uses machine learning (ML) techniques to understand and produce human language. These ML models can generate, summarize, translate, rewrite, classify, categorize, and analyze text—and more. The most popular use for these models at an enterprise level is to create a question-answering system, like a chatbot.

LLM foundation models are trained with general knowledge to support a broad range of use cases. However, they likely aren’t equipped with domain-specific knowledge that’s unique to your organization. RAG and fine-tuning are 2 ways to adjust and inform the LLM with the data you want so it produces the output you want.

For example, let’s say you’re building a chatbot to interact with customers. In this scenario, the chatbot is a representative of your company, so you’ll want it to act like a high-performing employee. You’ll want the chatbot to understand nuances about your company, like the products you sell and the policies you uphold. Just as you’d train an employee by giving them documents to study and scripts to follow, you train a chatbot by using RAG and fine-tuning to build upon the foundation of knowledge it arrives with. 

RAG supplements the data within an LLM by retrieving information from sources of your choosing, such as data repositories, collections of text, and pre-existing documentation. After retrieving the data, RAG architectures process it into an LLM’s context and generate an answer based on the blended sources.

RAG is most useful for supplementing your model with information that’s regularly updated. By providing an LLM with a line of communication to your chosen external sources, the output will be more accurate. And because you can engineer RAG to cite its source, it’s easy to trace how an output is formulated, which creates more transparency and builds trust.

Back to our example: If you were to build a chatbot that answers questions like, “What is your return policy?”, you could use a RAG architecture. You could connect your LLM to a document that details your company’s return policy and direct the chatbot to pull information from it. You could even instruct the chatbot to cite its source and provide a link for further reading. And if your return-policy document were to change, the RAG model would pull the most recent information and serve it to the user.

Learn more about RAG

 

Use cases for RAG

RAG can source and organize information in a way that makes it simple for people to interact with data. With a RAG architecture, models can fetch insights and provide an LLM with context from both on-premise and cloud-based data sources. This means external data, internal documents, and even social media feeds can be used to answer questions, provide context, and inform decision making.

For example, you can create a RAG architecture that, when queried, provides specific answers regarding company policies, procedures, and documentation. This saves time that would otherwise be spent searching for and interpreting a document manually.

Learn how RAG is used in software engineering

Think of fine-tuning as a way to communicate intent to the LLM so the model can tailor its output to fit your goals. Fine-tuning is the process of training a pretrained model further with a smaller, more targeted data set so it can more effectively perform domain-specific tasks. This additional training data is embedded into the model’s architecture.

Let’s return to our chatbot example. Say you want your chatbot to interact with patients in a medical context. It’s important that the model understands medical terminology related to your work. Using fine-tuning techniques, you can ensure that when a patient asks the chatbot about “PT services,” it will understand that as “physical therapy services” and direct them to the right resources.

Use cases for fine-tuning

Fine-tuning is most useful for training your model to interpret the information it has access to. For instance, you can train a model to understand the nuances and terminologies of your specific industry, such as acronyms and organizational values.

Fine-tuning is also useful for image-classification tasks. For example, if you’re working with magnetic resonance imagining (MRI), you can use fine-tuning to train your model to identify abnormalities.

Fine-tuning can also help your organization apply the right tone when communicating with others―especially in a customer-support context. It lets you train a chatbot to analyze the sentiment or emotion of the person it’s interacting with. Further, you can train the model to respond in a way that serves the user while upholding your organization’s values.

Understanding the differences between RAG and fine-tuning can help you make strategic decisions about which AI resource to deploy to suit your needs. Here are some basic questions to consider:

What’s your team’s skill set?

Customizing a model with RAG requires coding and architectural skills. Compared to traditional fine-tuning methods, RAG provides a more accessible and straightforward way to get feedback, troubleshoot, and fix applications. Fine-tuning a model requires experience with natural language processing (NLP), deep learning, model configuration, data reprocessing, and evaluation. Overall, it can be more technical and time consuming.

Is your data static or dynamic?

Fine-tuning teaches a model to learn common patterns that don’t change over time. Because it’s based on static snapshots of training data sets, the model’s information can become outdated and require retraining. Conversely, RAG directs the LLM to retrieve specific, real-time information from your chosen sources. This means your model pulls the most up-to-date data to inform your application, promoting accurate and relevant output.

What’s your budget?

Traditionally, fine-tuning is a deep learning technique that requires a lot of data and computational resources. Historically, to inform a model with fine-tuning, you need to label data and run training on costly, high-end hardware. Additionally, the performance of the fine-tuned model depends on the quality of your data, and obtaining high-quality data can be expensive.

Comparatively, RAG tends to be more cost efficient than fine-tuning. To set up RAG, you build pipeline systems to connect your data to your LLM. This direct connection cuts down on resource costs by using existing data to inform your LLM, rather than spending time, energy, and resources to generate new data. 

Red Hat’s open source solutions and AI partner ecosystem can help you implement RAG and fine-tuning into your large language model operations (LLMOps) process.

Experiment with fine-tuning using Instructlab

Created by Red Hat and IBM, InstructLab is an open source community project for contributing to LLMs used in generative AI (gen AI) applications. It provides a framework that uses synthetic data to make LLM fine-tuning more accessible.

How InstructLab's synthetic data enhances LLMs

Create your own foundation model with Red Hat Enterprise Linux AI

When your enterprise is ready to build applications with gen AI, Red Hat® Enterprise Linux® AI provides the foundation model platform needed to address your use cases with your data, faster.

Red Hat Enterprise Linux AI unites the Granite family of open source-licensed LLMs and InstructLab model alignment tools, all in a single server environment. This means it’s more accessible for domain experts who don’t have a data science background to fine-tune and contribute to an AI model that’s scalable across the hybrid cloud.

Red Hat Enterprise Linux AI is also backed by the benefits of a Red Hat subscription, which includes trusted enterprise product distribution, 24x7 production support, extended model lifecycle support, and Open Source Assurance legal protections.

Scale your applications with Red Hat OpenShift AI

Once you train your model with Red Hat Enterprise Linux AI, you can scale it for production through Red Hat OpenShift® AI.

Red Hat OpenShift AI is a flexible, scalable machine learning operations (MLOps) platform with tools to help you build, deploy, and manage AI-enabled applications. It provides the underlying workload infrastructure, such as an LLM to create embeddings, the retrieval mechanisms required to produce outputs, and access to a vector database. 

Solution Pattern

AI applications with Red Hat and NVIDIA AI Enterprise

Create a RAG application

Red Hat OpenShift AI is a platform for building data science projects and serving AI-enabled applications. You can integrate all the tools you need to support retrieval-augmented generation (RAG), a method for getting AI answers from your own reference documents. When you connect OpenShift AI with NVIDIA AI Enterprise, you can experiment with large language models (LLMs) to find the optimal model for your application.

Build a pipeline for documents

To make use of RAG, you first need to ingest your documents into a vector database. In our example app, we embed a set of product documents in a Redis database. Since these documents change frequently, we can create a pipeline for this process that we’ll run periodically, so we always have the latest versions of the documents.

Browse the LLM catalog

NVIDIA AI Enterprise gives you access to a catalog of different LLMs, so you can try different choices and select the model that delivers the best results. The models are hosted in the NVIDIA API catalog. Once you’ve set up an API token, you can deploy a model using the NVIDIA NIM model serving platform directly from OpenShift AI.

Choose the right model

As you test different LLMs, your users can rate each generated response. You can set up a Grafana monitoring dashboard to compare the ratings, as well as latency and response time for each model. Then you can use that data to choose the best LLM to use in production.

Download pdf icon

An architecture diagram shows an application built using Red Hat OpenShift AI and NVIDIA AI Enterprise. Components include OpenShift GitOps for connecting to GitHub and handling DevOps interactions, Grafana for monitoring, OpenShift AI for data science, Redis as a vector database, and Quay as an image registry. These components all flow to the app frontend and backend. These components are built on Red Hat OpenShift AI, with an integration with ai.nvidia.com.
Introducing

InstructLab

InstructLab is an open source project for enhancing large language models (LLMs).

Keep reading

Article

What is generative AI?

Generative AI relies on deep learning models trained on large data sets to create new content.

Article

What is machine learning?

Machine learning is the technique of training a computer to find patterns, make predictions, and learn from experience without being explicitly programmed.

Article

What are foundation models?

A foundation model is a type of machine learning (ML) model that is pre-trained to perform a range of tasks. 

More about AI/ML

Products

Now available

A foundation model platform used to seamlessly develop, test, and run Granite family LLMs for enterprise applications.

An AI-focused portfolio that provides tools to train, tune, serve, monitor, and manage AI/ML experiments and models on Red Hat OpenShift.

An enterprise application platform with a unified set of tested services for bringing apps to market on your choice of infrastructure. 

Red Hat Ansible Lightspeed with IBM watsonx Code Assistant is a generative AI service designed by and for Ansible automators, operators, and developers. 

Resources

e-book

Top considerations for building a production-ready AI/ML environment

Analyst Material

The Total Economic Impact™ Of Red Hat Hybrid Cloud Platform For MLOps

Webinar

Getting the most out of AI with open source and Kubernetes