Deepseek Mixture of Experts

About 8,340,000 results

Open links in new tab

Any time

modular.com
https://www.modular.com › ai-resources
Exploring DeepSeek-R1's Mixture-of-Experts Model Architecture
DeepSeek-R1, introduced in January 2025 by the Chinese AI startup DeepSeek, exemplifies these principles through its innovative Mixture-of-Experts (MoE) architecture. This article …
arxiv.org
https://arxiv.org › abs
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture …
Jan 11, 2024 · It involves two principal strategies: (1) finely segmenting the experts into mN ones and activating mK from them, allowing for a more flexible combination of activated experts; (2) …
researchgate.net
https://www.researchgate.net › publication
(PDF) DeepSeek-V3: A High-Performance Mixture-of-Experts
Jan 29, 2025 · DeepSeek-V3 is a cutting-edge Mixture-of-Experts (MoE) language model featuring 671 billion total parameters, with 37 billion activated per token. To optimize inference …
arxiv.org
https://arxiv.org › abs
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for …
Dec 13, 2024 · We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL, …
github.com
https://github.com › deepseek-ai
GitHub - deepseek-ai/DeepSeek-VL2: DeepSeek-VL2: Mixture-of-Experts …
Dec 13, 2024 · Introducing DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. …
dev.to
https://dev.to › sayed_ali_alkamel › deepseek-and-the...
DeepSeek and the Power of Mixture of Experts (MoE)
Jan 28, 2025 · DeepSeek is causing a stir in the AI community with its open-source large language models (LLMs), and a key factor in its success is the Mixture of Experts (MoE) …
arxiv.org
https://arxiv.org › abs
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts ...
May 7, 2024 · We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total …
github.com
https://github.com › deepseek-ai › DeepSeek-MoE
GitHub - deepseek-ai/DeepSeek-MoE: DeepSeekMoE: Towards Ultimate Expert ...
DeepSeekMoE 16B is a Mixture-of-Experts (MoE) language model with 16.4B parameters. It employs an innovative MoE architecture, which involves two principal strategies: fine-grained …
medium.com
https://medium.com › aimonks
DeepSeek-V3: Efficient and Scalable AI with Mixture-of-Experts
Dec 31, 2024 · Efficiency: DeepSeek-V3 uses Mixture-of-Experts (MoE) by enabling a portion of its parameters say, 37B out of 671B, for any input. This selective activation reduces the …
huggingface.co
https://huggingface.co › deepseek-ai
deepseek-ai/deepseek-vl2 - Hugging Face
Dec 18, 2024 · Introducing DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL.
Some results have been removed
Pagination
- 1
- 2
- 3
- 4
- 5
- Next

Exploring DeepSeek-R1's Mixture-of-Experts Model Architecture

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture …

(PDF) DeepSeek-V3: A High-Performance Mixture-of-Experts

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for …

GitHub - deepseek-ai/DeepSeek-VL2: DeepSeek-VL2: Mixture-of-Experts …

DeepSeek and the Power of Mixture of Experts (MoE)

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts ...

GitHub - deepseek-ai/DeepSeek-MoE: DeepSeekMoE: Towards Ultimate Expert ...

DeepSeek-V3: Efficient and Scalable AI with Mixture-of-Experts

deepseek-ai/deepseek-vl2 - Hugging Face