Plus: Yandex releases 100-billion-parameter language model for free, and more
IN BRIEF US hardware startup Cerebras claims to have trained the largest AI model on a single device powered by the world’s largest Wafer Scale Engine 2 chip the size of a plate.
“Using the Cerebras Software Platform (CSoft), our customers can easily train state-of-the-art GPT language models (such as GPT-3 and GPT-J) with up to 20 billion parameters on a single CS-2 system,” the company claimedthis week. “Running on a single CS-2, these models take minutes to set up and users can quickly move between models with just a few keystrokes.”
The CS-2 packs a whopping 850,000 cores, and has 40GB of on-chip memory capable of reaching 20 PB/sec memory bandwidth. The specs on other types of AI accelerators and GPUs pale in comparison, meaning machine learning engineers have to train huge AI models with billions of parameters across more servers.
Even though Cerebras has evidently managed to train the largest model on a single device, it will still struggle to win over big AI customers. The largest neural network systems contain hundreds of billions to trillions of parameters these days. In reality, many more CS-2 systems would be needed to train these models.
Machine learning engineers will likely run into similar challenges to those they already face when distributing training over numerous machines containing GPUs or TPUs – so why switch over to a less familiar hardware system that does not have as much software support?
Continue reading… “Cerebras sets record for ‘largest AI model’ on a single chip”