Quantization replaces high-precision (e.g. 16-bit) weights with smaller integer representations. Dettmers et al. (2022) showed with LLM.int8() that 8-bit matrix multiplication can run transformer inference at half the memory while preserving full-precision accuracy, by carefully handling rare high-magnitude features.
Quantization is what lets large models run on smaller GPUs and consumer hardware, and it pairs naturally with adapter methods like LoRA.