LoRA vs. QLoRA
LoRA (Low-Rank adaptation) and QLoRA (quantized Low-Rank adaptation) are both techniques for training AI models. More specifically, they are forms of parameter-efficient fine-tuning (PEFT), a fine-tuning technique that has gained popularity because it is more resource-efficient than other methods of training large language models (LLMs).
LoRA and QLoRA both help fine-tune LLMs more efficiently, but differ in how they manipulate the model and utilize storage to reach intended results.
How are LoRA and QLoRA different from traditional fine-tuning?
LLMs are complex models made up of large numbers of parameters—some can reach into the billions. These parameters allow the model to be trained on a certain amount of information. More parameters means more data storage and, overall, a more capable model.
Traditional fine-tuning requires the refitting (updating or adjusting) of each individual parameter in order to update the LLM. This can mean fine-tuning billions of parameters, which takes a large amount of compute time and money.
Updating each parameter can lead to “overfitting,” a term used to describe an AI model that is learning “noise,” or unhelpful data, in addition to the general training data.
Imagine a teacher and their classroom. The class has learned math all year long. Just before the test, the teacher emphasizes the importance of long division. Now during the test, many of the students find themselves overly preoccupied with long division and have forgotten key mathematical equations for questions that are just as important. This is what overfitting can do to an LLM during traditional fine-tuning.
In addition to issues with overfitting, traditional fine-tuning also presents a significant cost when it comes to resources.
QLoRA and LoRA are both fine-tuning techniques that provide shortcuts to improve the efficiency of full fine-tuning. Instead of training all of the parameters, it breaks the model down into matrices and only trains the parameters necessary to learn new information.
To follow our metaphor, these fine-tuning techniques are able to introduce new topics efficiently, without distracting the model from other topics on the test.