您当前的位置:首页 > News

IBM's new chip architecture points to faster, more energy-efficient artificial intelligence

发布时间 : 2023-11-08

A new chip prototype from IBM's research lab in California, long in the making, has the potential to disrupt how and where artificial intelligence can be effectively used.

We are in the midst of a Cambrian explosion of artificial intelligence. Over the past decade, AI has evolved from theory and small tests to enterprise-class use cases. But the hardware used to run AI systems, while increasingly powerful, was not designed with today's AI in mind. As AI systems scale up, costs soar. Moore's Law, the theory that the density of processor circuits doubles every year, has slowed.

But new research from IBM Research's lab in Almaden, California, after nearly two decades, has the potential to revolutionize how we effectively scale powerful AI hardware systems.

Since the dawn of the semiconductor industry, computer chips have largely followed the same basic structure, where the processing unit and the memory that stores the information to be processed are discretely stored. While this structure allows for simpler designs that scale well over decades, it creates what's known as a von Neumann bottleneck, in which time and effort are needed to constantly shuffle data back and forth between memory, processing, and any other device within the chip. Work by Dharmendra Modha of IBM Research and his colleagues aims to change that, taking inspiration from the way the brain computes. "It opens up a completely different path from the von Neumann architecture," Modha said.

For the past eight years, Modha has been working on a new type of digital AI chip for neuroreasoning, which he calls NorthPole. It is an extension of TrueNorth, the last brain-inspired chip Modha worked on until 2014. In tests of the popular ResNet-50 image recognition and YOLOv4 object detection models, the new prototype device demonstrated greater energy efficiency, higher space efficiency, and lower latency than any other chip currently on the market, and was about 4,000 times faster than TrueNorth.

The first set of promising results from the NorthPole chip were published today in the journal Science. According to Modha, NorthPole is a breakthrough in chip architecture, offering huge improvements in energy, space and time efficiency. Using the ResNet-50 model as a benchmark, NorthPole is much more efficient than a regular 12nm GPU and 14nm CPU. NorthPole itself is built on 12-nanometer node processing technology. In both cases, NorthPole's energy efficiency improved by a factor of 25 when it came to the number of frames per joule of power interpretation required. NorthPole also excelled in latency, as well as the amount of space required for computation, in terms of frames interpreted per second per billion transistors. According to Modha, on the ResNet-50, NorthPole outperforms all major popular architectures - even those that use more advanced technical processes, such as Gpus implemented using 4-nanometer processes.

How does it compute more efficiently than existing chips? One of the biggest differences from NorthPole is that all of the device's memory is on the chip itself, rather than separately connected. Without the von Neumann bottleneck, the chip can do AI reasoning faster than other chips already on the market. NorthPole is manufactured using a 12-nanometer node process that contains 22 billion transistors within 800 square millimeters. It has 256 cores and can perform 2,048 operations per cycle with 8-bit precision, with the potential to double and quadruple the number of operations with 4-bit and 2-bit precision, respectively. "It's an entire network on a chip," Modha said.

NorthPole chip on the PCIe card

"Architecturally, NorthPole blurs the line between compute and memory," Modha said. "At the individual kernel level, NorthPole appears as memory close to computation, while outside the chip, at the input/output level, it appears as active memory. This makes NorthPole easy to integrate into the system and significantly reduces the load on the host.

But NorthPole's biggest advantage is also a limitation: it can only be easily extracted from onboard memory. If the chip has to access information from another place, then all possible acceleration on the chip will be weakened. Through an approach called scaling out, NorthPole can actually support larger neural networks by breaking them down into smaller subnets that fit the NorthPole model's memory and connecting those subnets to multiple NorthPole chips. So while there is enough memory on NorthPole (or a set of Northpoles) for many models useful for specific applications, the chip is not meant to be a know-it-all. "We can't run GPT-4 on this, but we can meet many of the models that enterprises need," Modha said. "Of course, NorthPole is just for reasoning."

This efficacy means the device also doesn't need a bulky liquid cooling system to operate - fans and radiators are more than adequate - meaning it can be deployed in some fairly small Spaces.

Potential applications of NorthPole

While research on the NorthPole chip is still ongoing, its structure is suitable for emerging AI use cases, as well as more mature ones.

In testing, the NorthPole team focused primarily on computer vision-related uses, in part because funding for the project came from the U.S. Department of Defense. Some of the main applications considered are detection, image segmentation, and video classification. But it's also tested in other areas, such as natural language processing (on the encoder-only BERT model) and speech recognition (on the DeepSpeech2 model). The team is currently exploring mapping large language models of decoders only to the NorthPole scale-out system.

When you think about these AI tasks, all sorts of fantastical use cases come to mind, from self-driving cars to robots, digital assistants, or spatial computing. Many edge applications that require processing large amounts of data in real time may be well suited to NorthPole. For example, it could be moving self-driving cars from machines that need to set maps and routes to operate on a small scale to devices that can think and respond to rare edge situations that make navigating the real world so challenging for even skilled human drivers. These edge cases are the exact best choice for future NorthPole applications. NorthPole could enable satellites to monitor agriculture and manage wildlife populations, monitor vehicles and freight, ensure roads are safer and less crowded, safely operate robots, and detect cyber threats for safer businesses.

What's next

This was just the beginning of Modha's work at NorthPole. The most advanced cpus currently available are 3 nm, and IBM itself has been working on 2 nm nodes for years. This means that in addition to basic architectural innovations, NorthPole can implement generations of chip processing technologies to continuously improve efficiency and performance.

But for Moda, it was just a major milestone in the continuum he has dominated for the past 19 years of his career. During that time, he's been working on digital brain-inspired chips, knowing that the brain is the most energy-efficient processor we know of, and looking for ways to replicate it digitally. TrueNorth is entirely inspired by the structure of neurons in the brain, and there are as many digital "synapses" in it as there are in a bee's brain. But in 2015, Moda sat on a park bench in San Francisco and said he was thinking about his work so far. He believes it makes sense to combine the best of traditional processing equipment with the processing structures in the brain, where memory and processing are scattered throughout the brain. The answer is "brain-inspired computing with silicon speed," according to Modha.

Over the next eight years, Modha and his colleagues worked single-mindedly to make this vision a reality. The team worked hard at Amadon and didn't give any lectures or publish any papers on their work until this year. Everyone brings different skills and perspectives, but everyone is working together, so as a whole the team contributes far more than the sum of its parts. Now, the plan is to show what NorthPole can do, while exploring how to translate the design into a smaller chip production process and further exploring the architectural possibilities.

This work stems from a simple idea - how can we build computers that work like brains - and after years of basic research, has come up with an answer. This is really only possible today at a place like IBM Research, where there is time and space to explore the big questions in computing and where they can take us. "NorthPole is a faint representation of the brain in a silicon wafer mirror," Modha said.