Nvidia's closest rival once again obliterates cloud giants in AI performance; Cerebras Inference is 75x faster than AWS, 32x faster than Google on Llama 3.1 405B

When you purchase through links on our site, we may earn an affiliate commission.Heres how it works.

This opens up powerful new use cases, including reasoning and multi-agent collaboration, across the AI landscape.

It delivers a peak AI performance of 125 petaflops and boasts 7,000 times the memory bandwidth of theNvidiaH100.

Cerebras WSE-3

Customer trials for the system are ongoing, with general availability slated for Q1 2025.

Pricing begins at $6 per million input tokens and $12 per million output tokens.

Cerebras tokens per second on Llama 3.1 405B

seconds to first token received on Llama 3.1 405B

You might also like#