DeepSeek Launches DeepEP, a Communication library for Mixture of Experts Model Training and Inference

1 month ago 27

Published on February 25, 2025
In AI News

The library offers ‘high-throughput kernels for training and inference prefilling.’

Illustration by Supreeth Koundinya

On Tuesday, China’s DeepSeek AI launched DeepEP, a communication library for a mixture of expert (MoE) model training and inference. The announcement is a part of DeepSeek’s Open Source Week – where the AI startup is committed to open-source five repositories from its tech stack.

The library is designed to improve communication between graphic processing units (GPUs) and machine learning models using MoE architecture. DeepEP offers a set of kernels optimised for asymmetric-domain bandwidth forwarding and can efficiently move data between NVLink and RDMA connections.

DeepEP’s performance was tested on NVIDIA H800 GPUs with CX7 InfiniBand RDMA network cards. The GPUs have a maximum NVlink bandwidth of 160 GB/s – and DeepEP achieved a performance of 153 GB/s.

While the H800 has a maximum RDMA bandwidth of 50GB/s, DeepEP achieved a 43 GB/s performance.

Further, it can handle calculations using 8-bit floating point numbers (FP8), which accelerates computations and reduces memory usage.

DeepSeek provides detailed technical documentation and steps to install and configure the open-source library on GitHub.

DeepEP is the second of five open-source repositories DeepSeek plans to unveil. On Monday, It announced FlashMLA, a decoding kernel designed for Hopper GPUs. It is optimised for processing variable-length sequences and is now in production.

The kernel supports BF16 and features a paged KV cache with a block size of 64. On the H800 GPU, it achieves speeds of 3000 GB/s in memory-bound configurations and 580 TFLOPS in compute-bound configurations.

DeepSeek’s commitment to transparency and open-sourcing various technologies has earned praise from users across the internet. Stephen Pimentel, an engineer, said on X, “DeepSeek is effectively refuting the frequently made claim that ‘they lied’ about their training procedures.”

Recently, the startup released its DeepSeek-R1 and DeepSeek-V3 models, which created quite a shockwave across the industry. It was primarily due to the fact that they offered state-of-the-art performance while being trained and deployed at a fraction of the cost of their competitors—while being available as open source.

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.