And They're Off

There’s no x86 in the AI chip market yet—”People see a gold rush; there’s no doubt.”

A lot has changed since 1918. But whether it’s a literal (like the City of London School athletics’ U12 event) or figurative (AI chip development) race, participants still very much want to win.

For years, the semiconductor world seemed to have settled into a quiet balance: Intel vanquished virtually all of the RISC processors in the server world, save IBM’s POWER line. Elsewhere AMD had self-destructed, making it pretty much an x86 world. And Nvidia, a late starter in the GPU space, previously mowed down all of it many competitors in the 1990s. Suddenly only ATI, now a part of AMD, remained. It boasted just half of Nvidia’s prior market share.

On the newer mobile front, it looked to be a similar near-monopolistic story: ARM ruled the world. Intel tried mightily with the Atom processor, but the company met repeated rejection before finally giving up in 2015.

Then just like that, everything changed. AMD resurfaced as a viable x86 competitor; the advent of field gate programmable array (FPGA) processors for specialized tasks like Big Data created a new niche. But really, the colossal shift in the chip world came with the advent of artificial intelligence (AI) and machine learning (ML). With these emerging technologies, a flood of new processors has arrived—and they are coming from unlikely sources.

Intel got into the market with its purchase of startup Nervana Systems in 2016. It bought a second company, Movidius, for image processing AI.

Microsoft is preparing an AI chip for its HoloLens VR/AR headset, and there’s potential for use in other devices.

Google has a special AI chip for neural networks call the Tensor Processing Unit, or TPU, which is available for AI apps on the Google Cloud Platform.

Amazon is reportedly working on an AI chip for its Alexa home assistant.

Apple is working on an AI processor called the Neural Engine that will power Siri and FaceID.

ARM Holdings recently introduced two new processors, the ARM Machine Learning (ML) Processor and ARM Object Detection (OD) Processor. Both specialize in image recognition.

IBM is developing specific AI processor, and the company also licensed NVLink from Nvidia for high-speed data throughput specific to AI and ML.

Even non-traditional tech companies like Tesla want in on this area, with CEO Elon Musk acknowledging last year that former AMD and Apple chip engineer Jim Keller would be building hardware for the car company.

That macro-view doesn’t even begin to account for the startups. The New York Times puts the number of AI-dedicated startup chip companies—not software companies, silicon companies—at 45 and growing, but even that estimate may be incomplete. It’s tricky to get a complete picture since some are in China being funded by the government and flying under the radar.

Why the sudden explosion in hardware after years of chip maker stasis? After all, there is general consensus that Nvidia’s GPUs are excellent for AI and are widely used already. Why do we need more chips now, and so many different ones at that?

The answer is a bit complex, just like AI itself.

IMG_7820

Google’s 180 TFLOPS Cloud TPU card.

Follow the money (and usage and efficiency)

While x86 currently remains a dominant chip architecture for computing, it’s too general purpose for a highly specialized task like AI, says Addison Snell, CEO of Intersect360 Research, which covers HPC and AI issues.

“It was built to be a general server platform. As such it has to be pretty good at everything,” he says. “With other chips, [companies are] building something that specializes in one app without having to worry about the rest of the infrastructure. So leave the OS and infrastructure overhead to the x86 host and farm things out to various co-processors and accelerators.”

The actual task of processing AI is a very different process from standard computing or GPU processing, hence the perceived need for specialized chips. A x86 CPU can do AI, but it does a task in 12 steps when only three are required; a GPU in some cases can also be overkill.

Generally, scientific computation is done in a deterministic fashion. You want to know two plus three equals five and calculate it to all of its decimal places—x86 and GPU do that just fine. But the nature of AI is to say 2.5 + 3.5 is observed to be six almost all of the time without actually running the calculation. What matters with artificial intelligence today is the pattern found in the data, not the deterministic calculation.

In simpler terms, what defines AI and machine learning is that they draw upon and improve from past experience. The famous AlphaGo simulates tons of Go matches to improve. Another example you use every day is Facebook’s facial recognition AI, trained for years so it can accurately tag your photos (it should come as no surprise that Facebook has also made three major facial recognition acquisitions in recent years: Face.com [2012], Masquerade [2016], and Faciometrics [2016]).

Once a lesson is learned with AI, it does not to be relearned. That is the hallmark of Machine Learning, a subset of the greater definition of AI. At its core, ML is the practice of using algorithms to parse data, learn from it, and then make a determination or prediction based on that data. It’s a mechanism for pattern recognition—machine learning software remembers that two plus three equals five so the overall AI system can use that information, for instance. You can get into splitting hairs over whether that recognition is AI or not.

CHINA-SCIENCE-COMPUTERS-AI-RESEARCH-GAME-GO

In the future, maybe even “playing Go” will be a use case with a dedicated AI chip…

AI for self-driving cars, for another example, doesn’t use deterministic physics to determine the path of other things in its environment. It’s merely using previous experience to say this other car is here traveling this way, and all other times I observed such a vehicle, it traveled this way. Therefore, the system expects a certain type of action.

The result of this predictive problem solving is that AI calculations can be done with single precision calculations. So while CPUs and GPUs can both do it very well, they are in fact overkill for the task. A single-precision chip can do the work and do it in a much smaller, lower power footprint.

Make no mistake, power and scope are a big deal when it comes to chips—perhaps especially for AI, since one size does not fit all in this area. Within AI is machine learning, and within that is deep learning, and all those can be deployed for different tasks through different setups. “Not every AI chip is equal,” says Gary Brown, director of marketing at Movidius, an Intel company. Movidius made a custom chip just for deep learning processes because the steps involved are highly restricted on a CPU. “Each chip can handle different intelligence at different times. Our chip is visual intelligence, where algorithms are using camera input to derive meaning from what’s being seen. That’s our focus.”

Brown says there is even a need and requirement to differentiate at the network edge as well as in the data center—companies in this space are simply finding they need to use different chips in these different locations.

“Chips on the edge won’t compete with chips for the data center,” he says. “Data center chips like Xeon have to have high performance capabilities for that kind of AI, which is different for AI in smartphones. There you have to get down below one watt. So the question is, ‘Where is [the native processor] not good enough so you need an accessory chip?’”

After all, power is an issue if you want AI on your smartphone or augmented reality headset. Nvidia’s Volta processors are beasts at AI processing but draw up to 300 watts. You aren’t going to shoehorn one of those in a smartphone.

Sean Stetson, director of technology advancement at Seegrid, a maker of self-driving industrial vehicles like forklifts, also feels AI and ML have been ill served by general processors thus far. “In order to make any algorithm work, whether it’s machine learning or image processing or graphics processing, they all have very specific workflows,” he says. “If you do not have a compute core set up specific to those patterns, you do a lot of wasteful data loads and transfers. It’s when you are moving data around when you are most inefficient, that’s where you incur a lot of signaling and transient power. The efficiency of a processor is measured in energy used per instruction.”

A desire for more specialization and increased energy efficiency isn’t the whole reason these newer AI chips exist, of course. Brad McCredie, an IBM fellow and vice president of IBM Power systems development, adds one more obvious incentive for everyone seemingly jumping on the bandwagon: the prize is so big. “The IT industry is seeing growth for the first time in decades, and we’re seeing an inflection in exponential growth,” he says. “That whole inflection is new money expected to come to IT industry, and it’s all around AI. That is what has caused the flood of VC into that space. People see a gold rush; there’s no doubt.”

Volkswagen

You wouldn’t put a Ferrari engine in something like this, right? The same may go for AI chips partnering with non-AI-focused hardware and software.

A whole new ecosystem

AI-focused chips are not being designed in a vacuum. Accompanying them are new means of throughput to handle the highly parallel nature of AI and ML processing. If you build an AI co-processor and then use the outdated technologies of your standard PC, or even a server, that’s like putting a Ferrari engine in a Volkswagen Beetle.

“When people talk about AI and chips for AI, building an AI solution involves quite a lot of non-AI technology,” says Amir Khosrowshahi, vice president and CTO of the AI product group at Intel and co-founder of Nervana. “It involves CPUs, memory, SSD, and interconnects. It’s really critical to have all of these for getting it to work.”

When IBM designed its Power9 processor for mission critical systems, for example, it used Nvidia’s high-speed NVLink for core interconnects, PCI Express Generation 4, and its own interface called OpenCAPI (Coherent Accelerator Processor Interface). OpenCAPI is a new connection type that provides a high bandwidth, low latency connection for memory, accelerators, network, storage, and other chips.

The x86 ecosystem, says McCredie, isn’t keeping up. He points to the fact that PCI Express Gen 3 has been on the market seven years without a significant update (the first only happened recently), and IBM was one of the first to adopt it. x86 servers are still shipping with PCIe Gen 3, which has half the bandwidth of Gen 4.

“This explosion of compute capabilities will require a magnitude more of computational capacity,” he says. “We need processors to do all they can do and then some. The industry is finally getting into memory bandwidth and I/O bandwidth performance. These things are becoming first order constraints on system performance.”

“I think the set of accelerators will grow,” McCredie continues. “There are going to be more workloads that need more acceleration. We’re even going to go back and accelerate common workloads like databases and ERP (enterprise resource planning). I think we are seeing the start of a solid trend in the industry where we shift to more acceleration and more becoming available on the market.”

But hardware alone doesn’t do the learning in machine learning, software plays a major part. And in all of this rush for new chips, there is little mention of the software to accompany it. Luckily, that’s because the software is largely already there—it was waiting for the chips to catch up, argues Tom Doris, CEO of OTAS Technologies, a financial analytics and AI developer.

“I think that if you look at longer history, it’s all hardware-driven,” he says. “Algorithms haven’t changed much. Advances are all driven by advances in hardware. That was one of the surprises for me, having been away from the field for a few years. Things haven’t changed a whole lot in software and algorithms since the late 90s. It’s all about the compute power.”

David Rosenberg, data scientist in the Office of the CTO for Bloomberg, also feels the software is in good shape. “There are areas where the software has a long way to go, and that has to do with distributed computing, it has to do with the science of distribute neural computing,” he says. “But for the things we already know how to do, the software has been improved pretty well. Now it’s a matter of can the hardware execute the software fast enough and efficiently enough.”

With some use cases today, in fact, hardware and software are now being developed on parallel tracks with the aim of supporting this new wave of AI chips and use cases. At Nvidia, the software and hardware teams are roughly the same size, notes Ian Buck, the former Stanford University professor who developed what would become the CUDA programming language (CUDA allows developers to write apps to use the Nvidia GPU for parallel processing instead of a CPU). Buck now heads AI efforts at the chip company.

“We co-develop new architectures with system software, libraries, AI frameworks, and compilers, all to take advantage of new methods and neural networks showing up every day,” he says. “The only way to be successful in AI is not just build great silicon but also be tightly integrated all the way through the stack on the software stack, to implement and optimize these new networks being invented every day.”

So for Buck, one of the reasons why AI represents a new kind of computing is because he believes it really does constitute a new type of relationship between hardware and software. “We don’t need to think of backwards compatibility, we’re reinventing the kinds of processors good at these kinds of tasks and doing it in conjunction with the software to run on them.”

Intel is making further advancements in artificial intelligence with the announcement of the industry’s first neural network processor (NNP) designed for broad commercial enterprise use of AI -- the Intel Nervana Neural Network Processor. The Intel Nervana NNP is specifically designed for AI and optimized for deep learning applications. (Credit: Intel Corporation)

Intel’s Nervana Neural Network Processor, which is specifically designed for AI and optimized for deep learning applications.

The future of this horserace

While there is laundry list of potential AI chip developers today, one of the biggest questions surrounding all of these initiatives is how many will come to market versus how many will be kept for the vendor versus how many will be scrapped entirely. Most AI chips today are still vapor.

When it comes to the many non-CPU makers designing AI chips, like Google, Facebook, and Microsoft, it seems like those companies are making custom silicon for their own use and will likely never bring them to market. Such entities have the billions in revenue that can be plowed into R&D of custom chips without the need for immediate and obvious return on investment. So users may rely on Google’s Tensor Processing Unit as part of its Google Cloud service, but the company won’t sell it directly. That is a likely outcome for Facebook and Microsoft’s efforts as well.

Other chips are definitely coming to market. Nvidia recently announced three new AI-oriented chips: the Jetson Xavier system-on-chip designed for smarter robots; Drive Pegasus, which is designed for deep learning in autonomous taxis; and Drive Xavier for semi-autonomous cars. Powering all of that is Isaac Sim, a simulation environment that developers can use to train robots and perform tests with Jetson Xavier.

Meanwhile, Intel has promised that its first ML processor based on the Nervana technology it bought in 2016 will reach market in 2019 under the code name of Spring Crest. The company also currently has a Nervana chip for developers to get their feet wet with AI, called Lake Crest. Intel says Spring Crest will eventually offer three to four times the performance of Lake Crest.

Can all those survive? “I think in the future, we’re going to see an evolution of where AI manifests itself,” says Movidius’ Brown. “If you want it in a data center, you need a data center chip. If you want a headset, you find a chip for it. How this will evolve is we may see where different chips have different strengths, and those will possibly get merged into CPUs. What we may also see are chips coming out with multiple features.”

If all that feels a bit like deja vu, maybe it is. The progression of the AI chip could in some ways match how chips of the past evolved—things started with high specialization and many competitors, but eventually some offerings gained traction and a few market leaders encompassed multiple features. Thirty years ago, the 80386 was the premier desktop chip and if you were doing heavy calculations in Lotus 1-2-3, you bought an 80387 math co-processor for your IBM PC-AT. Then came the 80486, and Intel made all kinds of noises about the math co-processor being integrated into the CPU. The CPU then slowly gained things like security extensions, a memory controller, and GPU.

So like every other technology, this emerging AI chip industry likely won’t sustain its current plethora of competitors. For instance, OTAS’ Doris notes many internal-use chips that don’t come to market become pet projects for senior technologists, and a change of regime often means adopting the industry standard instead. Intersect360’s Snell points out that today’s army of AI chip startups will also diminish—“There’s so many competitors right now it has to consolidate,” as he puts it. Many of those companies will simply hope to carve out a niche that might entice a big player to acquire them.

“There will be a tough footrace, I agree,” IBM’s McCredie says. “There has to be a narrowing down.” One day, that may mean this new chip field looks a lot like those old chip fields—the x86, Nvidia GPU, ARM-worlds. But for now, this AI chip race has just gotten off the starting line, and its many entrants intend to keep running.

Via ArsTechnica