Because language models produce one token at a time, an API can emit each token as soon as it is ready rather than buffering the full reply. This is commonly delivered over Server-Sent Events, a browser-native streaming mechanism defined in the HTML Living Standard.
Streaming lowers perceived latency — the user sees words immediately — even though total generation time is unchanged.