🛡️
Session Flagged

Your session has been flagged for unusual activity.

You can try our app by searching for MultipleChat AI on Google and clicking the multiplechat.ai link to try it free.
Quick verification

Please confirm you're human to continue.


LLM ArchitectureEfficiency Updated 2026

Mixture of Experts (MoE)

An architecture that routes each input to a few specialised sub-networks instead of the whole model, giving large capacity at a lower cost per query.

In a Mixture-of-Experts model, a gating network sends each token to a small subset of "expert" sub-networks, so only part of the model activates per input. Shazeer et al. (2017) introduced the sparsely-gated MoE layer, and Fedus et al. (2021) simplified routing in the Switch Transformer to scale to trillion-parameter models with roughly constant compute per token.

MoE lets a model hold enormous total capacity while keeping inference cost closer to a much smaller dense model.

References

Primary, peer-reviewed and archival sources for this definition.

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., & Dean, J. (2017). International Conference on Learning Representations (ICLR 2017).
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Fedus, W., Zoph, B., & Shazeer, N. (2022). Journal of Machine Learning Research, 23(120).

Dictionary & encyclopedic entries

Cite this entry

MultipleChat. "Mixture of Experts (MoE)." MultipleChat AI & LLM Glossary, 2026. https://multiple.chat/ai-glossary/moe

Related terms

See this in practice

Run the same prompt across ChatGPT, Claude, Gemini and Grok — grounded in your own sources, cross-checked against each other.

Try MultipleChat Free

Continue learning

See paid plans