🛡️

Session Flagged

Your session has been flagged for unusual activity.

You can try our app by searching for MultipleChat AI on Google and clicking the multiplechat.ai link to try it free.

Quick verification

Please confirm you're human to continue.

LLM ArchitectureEfficiency Updated 2026

Mixture of Experts (MoE)

An architecture that routes each input to a few specialised sub-networks instead of the whole model, giving large capacity at a lower cost per query.

In a Mixture-of-Experts model, a gating network sends each token to a small subset of "expert" sub-networks, so only part of the model activates per input. Shazeer et al. (2017) introduced the sparsely-gated MoE layer, and Fedus et al. (2021) simplified routing in the Switch Transformer to scale to trillion-parameter models with roughly constant compute per token.

MoE lets a model hold enormous total capacity while keeping inference cost closer to a much smaller dense model.

References

Primary, peer-reviewed and archival sources for this definition.

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., & Dean, J. (2017). International Conference on Learning Representations (ICLR 2017).

Source arXiv:1701.06538

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Fedus, W., Zoph, B., & Shazeer, N. (2022). Journal of Machine Learning Research, 23(120).

Source arXiv:2101.03961

Dictionary & encyclopedic entries

Wikipedia — Mixture of experts
IBM — Think / Topics — What is mixture of experts?

Cite this entry

MultipleChat. "Mixture of Experts (MoE)." MultipleChat AI & LLM Glossary, 2026. https://multiple.chat/ai-glossary/moe

Related terms

Transformer LLM (Large Language Model) Latency

Back to the full glossary

See this in practice

Run the same prompt across ChatGPT, Claude, Gemini and Grok — grounded in your own sources, cross-checked against each other.

Try MultipleChat Free

Continue learning

See paid plans

Pricing

Mixture of Experts (MoE)

References

Dictionary & encyclopedic entries

Cite this entry

Related terms

See this in practice

Compare MultipleChat plans

Compare AI models side by side

Which AI should I use?

Use ChatGPT, Claude and Gemini together

Multi-model AI platform

What is multi-model AI?

AI model comparison tool

AI productivity toolkit 2026

Free AI tools from MultipleChat

References

Dictionary & encyclopedic entries

Cite this entry

Related terms

See this in practice

Related AI guides and next steps

Compare MultipleChat plans

Compare AI models side by side

Which AI should I use?

Use ChatGPT, Claude and Gemini together

Multi-model AI platform

What is multi-model AI?

AI model comparison tool

AI productivity toolkit 2026

Free AI tools from MultipleChat