Tesla’s new cluster will employ 10,000 Nvidia H100 compute GPUs, which will offer a peak performance of 340 FP64 PFLOPS for technical computing and 39.58 INT8 ExaFLOPS for AI applications. In fact, Tesla’s 340 FP64 PFLOPS is higher than 304 FP64 PFLOPS offered by Leonardo, the world’s fourth highest-performing supercomputer. With its new supercomputer, Tesla is significantly enhancing its computing capabilities to train its full self-driving (FSD) technology faster than ever. This could not only make Tesla more competitive than other automakers but will make the company the owner of one of the world’s fastest supercomputers.

“Due to real-world video training, we may have the largest training datasets in the world, hot tier cache capacity beyond 200PB — orders of magnitudes more than LLMs,” explained Tim Zaman, AI Infra & AI Platform Engineering Manager at Tesla. While the new H100-based cluster is set to dramatically improve Tesla’s training speed, Nvidia is struggling to meet demand for these GPUs. As a result, Tesla is investing over $1 billion to develop its own supercomputer, Dojo, which is built on custom-designed, highly optimized system-on-chips. 

Dojo will not only accelerate FSD training but will also manage data processing for Tesla’s entire vehicle fleet. Tesla is simultaneously bringing its Nvidia H100 GPU cluster online along with Dojo, a move that will give the company unparalleled computing power in the automotive industry.

Elon Musk recently revealed that Tesla plans to spend over $2 billion on AI training in 2023 and another $2 billion in 2024 specifically on computing for FSD training. This underscores Tesla’s commitment to overcoming computational bottlenecks and should provide substantial advantages over its rivals. 

By Impact Lab