LoRA vs. QLoRA

Copiar URL

LoRA (Low-Rank adaptation) and QLoRA (quantized Low-Rank adaptation) are both techniques for training AI models. More specifically, they are forms of parameter-efficient fine-tuning (PEFT), a fine-tuning technique that has gained popularity because it is more resource-efficient than other methods of training large language models (LLMs)

LoRA and QLoRA both help fine-tune LLMs more efficiently, but differ in how they manipulate the model and utilize storage to reach intended results.

Explore Red Hat AI 

LLMs are complex models made up of large numbers of parameters—some can reach into the billions. These parameters allow the model to be trained on a certain amount of information. More parameters means more data storage and, overall, a more capable model.

Traditional fine-tuning requires the refitting (updating or adjusting) of each individual parameter in order to update the LLM. This can mean fine-tuning billions of parameters, which takes a large amount of compute time and money.

Updating each parameter can lead to “overfitting,” a term used to describe an AI model that is learning “noise,” or unhelpful data, in addition to the general training data.

What are foundation models? 

Imagine a teacher and their classroom. The class has learned math all year long. Just before the test, the teacher emphasizes the importance of long division. Now during the test, many of the students find themselves overly preoccupied with long division and have forgotten key mathematical equations for questions that are just as important. This is what overfitting can do to an LLM during traditional fine-tuning.

In addition to issues with overfitting, traditional fine-tuning also presents a significant cost when it comes to resources.

QLoRA and LoRA are both fine-tuning techniques that provide shortcuts to improve the efficiency of full fine-tuning. Instead of training all of the parameters, it breaks the model down into matrices and only trains the parameters necessary to learn new information.

To follow our metaphor, these fine-tuning techniques are able to introduce new topics efficiently, without distracting the model from other topics on the test.

Learn more about parameter-efficient fine-tuning (PEFT)

Inteligência artificial da Red Hat

Inteligência artificial da Red Hat

Assuma o controle do seu futuro utilizando as plataformas open source da Red Hat para criar, implantar e monitorar modelos e aplicações de IA.

Leia mais

O que é vLLM?

O vLLM é uma coleção de códigos open source que ajuda os modelos de linguagem a realizar cálculos com mais eficiência.

O que é inferência de IA?

A inferência de IA é quando um modelo de IA fornece uma resposta baseada em dados. É a etapa final de um processo complexo da tecnologia de machine learning (aprendizado de máquina).

IA preditiva e IA generativa

A IA preditiva e a IA generativa são bem diferentes e têm casos de uso específicos. À medida que a IA evolui, é essencial conhecer os diferentes tipos dessa tecnologia para entender os recursos que ela oferece.

IA/ML: leitura recomendada