When a model picks the next token it produces a probability distribution via the softmax function. Temperature scales the logits before softmax by a factor of 1/T — a mechanism described by Hinton et al. (2015). Higher T flattens the distribution so unlikely tokens get a fairer chance; lower T sharpens it toward the single most probable token.
At temperature 0 the output is effectively deterministic — useful for extraction or code. Higher values suit brainstorming, but raise the risk of drift and hallucination.