🛡️

Session Flagged

Your session has been flagged for unusual activity.

You can try our app by searching for MultipleChat AI on Google and clicking the multiplechat.ai link to try it free.

Quick verification

Please confirm you're human to continue.

LLM ArchitectureFoundations Updated 2026

Attention Mechanism

The technique that lets a model weigh how much each part of the input matters when producing each part of the output — the core operation inside every Transformer.

Attention was introduced for machine translation by Bahdanau et al. (2015), letting a decoder look back at the most relevant source words instead of a single fixed summary vector. Vaswani et al. (2017) then made self-attention the whole architecture, removing recurrence entirely.

Each token computes a weighted combination of all others, where the weights say how relevant each is. This is how a model resolves references, tracks long-range structure and decides what to focus on.

References

Primary, peer-reviewed and archival sources for this definition.

Neural Machine Translation by Jointly Learning to Align and Translate

Bahdanau, D., Cho, K., & Bengio, Y. (2015). International Conference on Learning Representations (ICLR 2015).

Source arXiv:1409.0473

Attention Is All You Need

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Advances in Neural Information Processing Systems 30 (NeurIPS 2017).

Source arXiv:1706.03762

Dictionary & encyclopedic entries

Wikipedia — Attention (machine learning)
IBM — Think / Topics — What is an attention mechanism?

Cite this entry

MultipleChat. "Attention Mechanism." MultipleChat AI & LLM Glossary, 2026. https://multiple.chat/ai-glossary/attention

Related terms

Transformer Context Window LLM (Large Language Model)

Back to the full glossary

See this in practice

Run the same prompt across ChatGPT, Claude, Gemini and Grok — grounded in your own sources, cross-checked against each other.

Try MultipleChat Free

Continue learning

See paid plans

Pricing

Attention Mechanism

References

Dictionary & encyclopedic entries

Cite this entry

Related terms

See this in practice

Compare MultipleChat plans

Compare AI models side by side

Which AI should I use?

Use ChatGPT, Claude and Gemini together

Multi-model AI platform

What is multi-model AI?

AI model comparison tool

AI productivity toolkit 2026

Free AI tools from MultipleChat

References

Dictionary & encyclopedic entries

Cite this entry

Related terms

See this in practice

Related AI guides and next steps

Compare MultipleChat plans

Compare AI models side by side

Which AI should I use?

Use ChatGPT, Claude and Gemini together

Multi-model AI platform

What is multi-model AI?

AI model comparison tool

AI productivity toolkit 2026

Free AI tools from MultipleChat