LoRA, introduced by Hu et al. (2021), freezes the pre-trained weights and injects small trainable low-rank matrices into each layer. Only those tiny matrices are updated, cutting the number of trained parameters by orders of magnitude and the memory cost of fine-tuning dramatically — with little loss in quality.
Because the adapters are small and swappable, LoRA makes it cheap to maintain many task- or customer-specific variants of one base model.