- Published on February 25, 2025
- In AI News
The library offers ‘high-throughput kernels for training and inference prefilling.’

Illustration by Supreeth Koundinya
On Tuesday, China’s DeepSeek AI launched DeepEP, a communication library for a mixture of expert (MoE) model training and inference. The announcement is a part of DeepSeek’s Open Source Week – where the AI startup is committed to open-source five repositories from its tech stack.
The library is designed to improve communication between graphic processing units (GPUs) and machine learning models using MoE architecture. DeepEP offers a set of kernels optimised for asymmetric-domain bandwidth forwarding and can efficiently move data between NVLink and RDMA connections.
DeepEP’s performance was tested on NVIDIA H800 GPUs with CX7 InfiniBand RDMA network cards. The GPUs have a maximum NVlink bandwidth of 160 GB/s – and DeepEP achieved a performance of 153 GB/s.
While the H800 has a maximum RDMA bandwidth of 50GB/s, DeepEP achieved a 43 GB/s performance.
Further, it can handle calculations using 8-bit floating point numbers (FP8), which accelerates computations and reduces memory usage.
DeepSeek provides detailed technical documentation and steps to install and configure the open-source library on GitHub.
DeepEP is the second of five open-source repositories DeepSeek plans to unveil. On Monday, It announced FlashMLA, a decoding kernel designed for Hopper GPUs. It is optimised for processing variable-length sequences and is now in production.
The kernel supports BF16 and features a paged KV cache with a block size of 64. On the H800 GPU, it achieves speeds of 3000 GB/s in memory-bound configurations and 580 TFLOPS in compute-bound configurations.
DeepSeek’s commitment to transparency and open-sourcing various technologies has earned praise from users across the internet. Stephen Pimentel, an engineer, said on X, “DeepSeek is effectively refuting the frequently made claim that ‘they lied’ about their training procedures.”
Recently, the startup released its DeepSeek-R1 and DeepSeek-V3 models, which created quite a shockwave across the industry. It was primarily due to the fact that they offered state-of-the-art performance while being trained and deployed at a fraction of the cost of their competitors—while being available as open source.
Supreeth Koundinya
Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.
Subscribe to The Belamy: Our Weekly Newsletter
Biggest AI stories, delivered to your inbox every week.
Rising 2025 Women in Tech & AI
March 20 and 21, 2025 | 📍 NIMHANS Convention Center, Bengaluru
AI Startups Conference.April 25, 2025 | 📍 Hotel Radisson Blue, Bangalore, India
Data Engineering Summit 2025
May 15-16, 2025 | 📍 Hotel Radisson Blu, Bengaluru
MachineCon GCC Summit 2025
June 20-22, 2025 | 📍 ITC Grand, Goa
Sep 17-19, 2025 | 📍KTPO, Whitefield, Bangalore, India
India's Biggest Developers Summit Feb, 2025 | 📍Nimhans Convention Center, Bangalore
Our Discord Community for AI Ecosystem.