When you purchase through links on our site, we may earn an affiliate commission.Heres how it works.
Chinas DeepSeek AIchatbothas stunned the tech industry, representing a credible alternative toOpenAIsChatGPTat a fraction of the cost.
But exactly how DeepSeek’s developers managed this feat is likely down to a clever hack.
Example DualPipe scheduling for 8 PP ranks and 20 micro-batches in two directions. The micro-batches in the reverse direction are symmetric to those in the forward direction, so we omit their batch ID for illustration simplicity. Two cells enclosed by a shared black border have mutually overlapped computation and communication.
A virtual DPU on the GPU itself
First, some background.
This includes the creation of the DualPipe algorithm for efficient pipeline parallelism.
Example DualPipe scheduling for 8 PP ranks and 20 micro-batches in two directions. The micro-batches in the reverse direction are symmetric to those in the forward direction, so we omit their batch ID for illustration simplicity. Two cells enclosed by a shared black border have mutually overlapped computation and communication.