Latency in an LLM system is usually split into time-to-first-token and time-per-output-token. Both grow with model size and sequence length, because attention cost scales with the amount of context; the survey by Tay et al. (2022) catalogues the efficiency techniques developed to mitigate this.
Approaches that reduce latency include sparsity (Mixture of Experts), quantization, caching and streaming — each trading some accuracy, memory or complexity for speed.