🛡️
Session Flagged

Your session has been flagged for unusual activity.

You can try our app by searching for MultipleChat AI on Google and clicking the multiplechat.ai link to try it free.
Quick verification

Please confirm you're human to continue.


LLM ArchitectureFoundations Updated 2026

Transformer

The neural-network architecture behind virtually every modern large language model, built on a mechanism called self-attention instead of recurrence or convolution.

The Transformer is a sequence-modelling architecture introduced by Vaswani et al. (2017). Its central idea, self-attention, lets every position in a sequence attend directly to every other position, so the model can capture long-range relationships in a single step rather than passing information along a chain as recurrent networks do.

Because attention over a sequence is highly parallelisable, Transformers train far more efficiently on modern hardware than the RNNs and LSTMs they replaced. That efficiency is what made today's large language models practical to train.

Why it matters

ChatGPT, Claude, Gemini and Grok are all Transformer-based. Understanding attention, layers and context length — all Transformer concepts — is the foundation for almost every other term in this glossary.

References

Primary, peer-reviewed and archival sources for this definition.

Attention Is All You Need
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Advances in Neural Information Processing Systems 30 (NeurIPS 2017).

Dictionary & encyclopedic entries

Cite this entry

MultipleChat. "Transformer." MultipleChat AI & LLM Glossary, 2026. https://multiple.chat/ai-glossary/transformer

Related terms

See this in practice

Run the same prompt across ChatGPT, Claude, Gemini and Grok — grounded in your own sources, cross-checked against each other.

Try MultipleChat Free

Continue learning

See paid plans