DeepSeek Launches DeepEP, a Communication library for Mixture of Experts Model Training and Inference

1 month ago 27
  • Published on February 25, 2025
  • In AI News

The library offers ‘high-throughput kernels for training and inference prefilling.’ 

Illustration by Supreeth Koundinya

On Tuesday, China’s DeepSeek AI launched DeepEP, a communication library for a mixture of expert (MoE) model training and inference. The announcement is a part of DeepSeek’s Open Source Week – where the AI startup is committed to open-source five repositories from its tech stack. 

The library is designed to improve communication between graphic processing units (GPUs) and machine learning models using MoE architecture. DeepEP offers a set of kernels optimised for asymmetric-domain bandwidth forwarding and can efficiently move data between NVLink and RDMA connections. 

DeepEP’s performance was tested on NVIDIA H800 GPUs with CX7 InfiniBand RDMA network cards. The GPUs have a maximum NVlink bandwidth of 160 GB/s – and DeepEP achieved a performance of 153 GB/s. 

While the H800 has a maximum RDMA bandwidth of 50GB/s, DeepEP achieved a 43 GB/s performance. 

Further, it can handle calculations using 8-bit floating point numbers (FP8), which accelerates computations and reduces memory usage. 

DeepSeek provides detailed technical documentation and steps to install and configure the open-source library on GitHub

DeepEP is the second of five open-source repositories DeepSeek plans to unveil. On Monday, It announced FlashMLA, a decoding kernel designed for Hopper GPUs. It is optimised for processing variable-length sequences and is now in production.

The kernel supports BF16 and features a paged KV cache with a block size of 64. On the H800 GPU, it achieves speeds of 3000 GB/s in memory-bound configurations and 580 TFLOPS in compute-bound configurations.

DeepSeek’s commitment to transparency and open-sourcing various technologies has earned praise from users across the internet. Stephen Pimentel, an engineer, said on X, “DeepSeek is effectively refuting the frequently made claim that ‘they lied’ about their training procedures.”

Recently, the startup released its DeepSeek-R1 and DeepSeek-V3 models, which created quite a shockwave across the industry. It was primarily due to the fact that they offered state-of-the-art performance while being trained and deployed at a fraction of the cost of their competitors—while being available as open source. 

Picture of Supreeth Koundinya

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.

Association of Data Scientists

GenAI Corporate Training Programs

India's Biggest Women in Tech Summit

March 20 and 21, 2025 | 📍 NIMHANS Convention Center, Bengaluru

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Rising 2025 Women in Tech & AI

March 20 and 21, 2025 | 📍 NIMHANS Convention Center, Bengaluru

AI Startups Conference.April 25, 2025 | 📍 Hotel Radisson Blue, Bangalore, India

Data Engineering Summit 2025

May 15-16, 2025 | 📍 Hotel Radisson Blu, Bengaluru

MachineCon GCC Summit 2025

June 20-22, 2025 | 📍 ITC Grand, Goa

Sep 17-19, 2025 | 📍KTPO, Whitefield, Bangalore, India

India's Biggest Developers Summit Feb, 2025 | 📍Nimhans Convention Center, Bangalore

discord icon

Our Discord Community for AI Ecosystem.

Read Entire Article