🛡️
Session Flagged

Your session has been flagged for unusual activity.

You can try our app by searching for MultipleChat AI on Google and clicking the multiplechat.ai link to try it free.
Quick verification

Please confirm you're human to continue.


LLM ArchitectureFoundations Updated 2026

Attention Mechanism

The technique that lets a model weigh how much each part of the input matters when producing each part of the output — the core operation inside every Transformer.

Attention was introduced for machine translation by Bahdanau et al. (2015), letting a decoder look back at the most relevant source words instead of a single fixed summary vector. Vaswani et al. (2017) then made self-attention the whole architecture, removing recurrence entirely.

Each token computes a weighted combination of all others, where the weights say how relevant each is. This is how a model resolves references, tracks long-range structure and decides what to focus on.

References

Primary, peer-reviewed and archival sources for this definition.

Neural Machine Translation by Jointly Learning to Align and Translate
Bahdanau, D., Cho, K., & Bengio, Y. (2015). International Conference on Learning Representations (ICLR 2015).
Attention Is All You Need
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Advances in Neural Information Processing Systems 30 (NeurIPS 2017).

Dictionary & encyclopedic entries

Cite this entry

MultipleChat. "Attention Mechanism." MultipleChat AI & LLM Glossary, 2026. https://multiple.chat/ai-glossary/attention

Related terms

See this in practice

Run the same prompt across ChatGPT, Claude, Gemini and Grok — grounded in your own sources, cross-checked against each other.

Try MultipleChat Free

Continue learning

See paid plans