Jeff Hawkins: By the age of five, a child can understand spoken language, distinguish a cat from a dog, and play a game of catch. These are three of the many things humans find easy that computers and robots currently cannot do. Despite decades of research, we computer scientists have not figured out how to do basic tasks of perception and robotics with a computer.
ILLUSTRATION: Bryan Christie Design
Our few successes at building "intelligent" machines are notable equally for what they can and cannot do. Computers, at long last, can play winning chess. But the program that can beat the world champion can’t talk about chess, let alone learn backgammon. Today’s programs-at best-solve specific problems. Where humans have broad and flexible capabilities, computers do not.
Perhaps we’ve been going about it in the wrong way. For 50 years, computer scientists have been trying to make computers intelligent while mostly ignoring the one thing that is intelligent: the human brain. Even so-called neural network programming techniques take as their starting point a highly simplistic view of how the brain operates.
In some ways, the task has been wrongly posed right from the start. In 1950, Alan Turing, the computer pioneer behind the British code-breaking effort in World War II, proposed to reframe the problem of defining artificial intelligence as a challenge that has since been dubbed the Turing Test. Put simply, it asked whether a computer, hidden from view, could conduct a conversation in such a way that it would be indistinguishable from a human.
So far, the answer has been a resounding no. Turing’s behavioral framing of the problem has led researchers away from the most promising avenue of study: the human brain. It is clear to many people that the brain must work in ways that are very different from digital computers. To build intelligent machines, then, why not understand how the brain works, and then ask how we can replicate it?
My colleagues and I have been pursuing that approach for several years. We’ve focused on the brain’s neocortex, and we have made significant progress in understanding how it works. We call our theory, for reasons that I will explain shortly, Hierarchical Temporal Memory, or HTM. We have created a software platform that allows anyone to build HTMs for experimentation and deployment. You don’t program an HTM as you would a computer; rather you configure it with software tools, then train it by exposing it to sensory data. HTMs thus learn in much the same way that children do. HTM is a rich theoretical framework that would be impossible to describe fully in a short article such as this, so I will give only a high level overview of the theory and technology. Details of HTM are available at http://www.numenta.com
First, I will describe the basics of HTM theory, then I will give an introduction to the tools for building products based on it. It is my hope that some readers will be enticed to learn more and to join us in this work.
We have concentrated our research on the neocortex, because it is responsible for almost all high-level thought and perception, a role that explains its exceptionally large size in humans-about 60 percent of brain volume [see illustration "Goldenrod
"]. The neocortex is a thin sheet of cells, folded to form the convolutions that have become a visual synonym for the brain itself. Although individual parts of the sheet handle problems as different as vision, hearing, language, music, and motor control, the neocortical sheet itself is remarkably uniform. Most parts look nearly identical at the macroscopic and microscopic level.
Because of the neocortex’s uniform structure, neuro-scientists have long suspected that all its parts work on a common algorithm-that is, that the brain hears, sees, understands language, and even plays chess with a single, flexible tool. Much experimental evidence supports the idea that the neocortex is such a general-purpose learning machine. What it learns and what it can do are determined by the size of the neocortical sheet, what senses the sheet is connected to, and what experiences it is trained on. HTM is a theory of the neocortical algorithm. If we are right, it represents a new way of solving computational problems that so far have eluded us.
Although the entire neocortex is fairly uniform, it is divided into dozens of areas that do different things. Some areas, for instance, are responsible for language, others for music, and still others for vision. They are connected by bundles of nerve fibers. If you make a map of the connections, you find that they trace a hierarchical design. The senses feed input directly to some regions, which feed information to other regions, which in turn send information to other regions. Information also flows down the hierarchy, but because the up and down pathways are distinct, the hierarchical arrangement remains clear and is well documented.
One of the baffling aspects of the brain is that it decides what to learn on its own
As a general rule, neurons at low levels of the hierarchy represent simple structure in the input, and neurons at higher levels represent more complex structure in the input. For example, input from the ears travels through a succession of regions, each representing progressively more complex aspects of sound. By the time the information reaches a language center, we find cells that respond to words and phrases independent of speaker or pitch.
Because the regions of the cortex nearest to the sensory input are relatively large, you can visualize the hierarchy as a tree’s root system, in which sensory input enters at the wide bottom, and high-level thoughts occur at the trunk. There are many details I am omitting; what is important is that the hierarchy is an essential element of how the neocortex is structured and how it stores information.
HTMs are similarly built around a hierarchy of nodes. The hierarchy and how it works are the most important features of HTM theory. In an HTM, knowledge is distributed across many nodes up and down the hierarchy. Memory of what a dog looks like is not stored in one location. Low-level visual details such as fur, ears, and eyes are stored in low-level nodes, and high-level structure, such as head or torso, are stored in higher-level nodes. [See illustrations, "Everyone Knows You’re a Dog
" and "Higher & Higher."] In an HTM, you cannot always concretely locate such knowledge, but the general idea is correct.
Hierarchical representations solve many problems that have plagued AI and neural networks. Often systems fail because they cannot handle large, complex problems. Either it takes too long to train a system or it takes too much memory. A hierarchy, on the other hand, allows us to "reuse" knowledge and thus make do with less training. As an HTM is trained, the low-level nodes learn first. Representations in high-level nodes then share what was previously learned in low-level nodes.
For example, a system may take a lot of time and memory to learn what dogs look like, but once it has done so, it will be able to learn what cats look like in a shorter time, using less memory. The reason is that cats and dogs share many low-level features, such as fur, paws, and tails, which do not have to be relearned each time you are confronted with a new animal.
The second essential resemblance between HTM and the neocortex lies in the way they use time to make sense of the fast-flowing river of data they receive from the outside world. On the most basic level, each node in the hierarchy learns common, sequential patterns, analogous to learning a melody. When a new sequence comes along, the node matches the input to previously learned patterns, analogous to recognizing a melody. Then the node outputs a constant pattern representing the best matched sequences, analogous to naming a melody. Given that the output of nodes at one level becomes input to nodes at the next level, the hierarchy learns sequences of sequences of sequences.
That is how HTMs turn rapidly changing sensory patterns at the bottom of the hierarchy into relatively stable thoughts and concepts at the top of it. Information can flow down the hierarchy, unfolding sequences of sequences. For example, when you give a speech, you start with a sequence of high-level concepts, each of which unfolds into a sequence of sentences, each of which unfolds into a sequence of words and then phonemes.
Jeff Hawkins – Creator of the Palm Pilot and Visor