DALL·E, introduced by Ramesh et al. (2021), generates images from text prompts by modelling text and image tokens as a single stream with a Transformer. It demonstrated convincing zero-shot text-to-image generation and helped launch the consumer wave of generative image tools.
Later text-to-image systems increasingly use diffusion models, but DALL·E established the prompt-to-picture paradigm now common in creative tools.