The tech behind the tech: How Intel’s NPUs accelerate AI

10 Jan 2025

Michael Langan, Intel. Image: © Joe Gavin/Frank Gavin Photography

Jenny Darmody sat down with Intel’s Michael Langan to discuss the work that happens within the neural processing unit team and how the architecture is changing.

As the AI bubble continues to grow, it’s hard to escape news about the technology. If it’s not about a new large language model (LLM) release, it’s a funding announcement for a start-up using AI technology.

The intense interest has even led to AI washing – the practice of exaggerating or misrepresenting AI capabilities to attract customers and investors, which in turn has led to regulatory bodies such as the US Federal Trade Commission cracking down on deceptive AI schemes used to trick users.

But while many are focused on building on, investing in or in some way utilising the AI technology that’s already out there, under the hood hardware is needed to actually handle the computing that AI needs to function.

This is where neural processing units (NPUs) come into play. Also known as AI accelerators and designed to mimic the processing function of the human brain, these are specialised pieces of hardware that speed up the computations needed for AI models to work.

To better understand the work that goes into creating NPUs, I spoke to Michael Langan at the annual Midas conference in November 2024. Langan has been with Intel for 14 years and now runs the NPU IP team at Intel in Ireland. “That’s the central IP for all client devices, laptops, desktops. It’s a $30bn market for us every year on revenue and AI is obviously a key piece.”

Ireland’s NPU footprint

The worldwide NPU IP team in Intel is about 500, but Langan pointed out that the origins of the IP came from Irish start-up Movidius, which Intel acquired in 2016.

He said that the whole wave around neural processing started around 2012 with convolutional neural networks, a type of deep learning neural network architecture commonly used in computer vision for image or object recognition.

Then, in 2017, Google released a paper called ‘Attention is All You Need’ with a model architecture called transformers, which Langan said “changed everything overnight”.

“That’s where your ChatGPT comes from, your LLMs, so everything you hear about that is based on that single architecture. So, the design we do is to accelerate workloads like that,” he said.

Within Intel, Langan said they do “the whole lot” when it comes to the different functions. “The hardware team is Verilog RTL design, the traditional design, a big verification team, we have layouts on both TSMC and Intel process nodes, so the design can go anywhere in any application and then we’ve a very, very big software team and a big compiler team because that’s key technology for everything.

“There’s a real race for optimised AI compilers and we’ve a lot of that based in Ireland,” he said. “We have a big team [in Leixlip], 250-300 but we’re kind of small compared to the rest, it’s like, 5,000 folks across the site. So, we’re just small fry on that site but, but we’ve got a big function for the greater Intel.”

Trends, talent and transformers

The biggest challenge for those working on NPUs is the pace of change, particularly recently. So, if companies like Microsoft, Dell or HP have new models, new features and new applications – as they often do – there’s a backlog for those essentially working on the tech behind the tech.

“It used to be a case where our customer-facing folks were going out to the market and saying, ‘hey, this new feature, you should try it. Trust me, it’s going to be amazing’,” said Langan. Now, he said, it’s the other way around, with customers coming to them with new applications and feature needs.

Another challenge is a talent shortage due to the specialised skills needed to work on NPUs. Langan said they’re always looking for those that can work in deep learning hardware and software, as well as AI compilers to name a few.

In order to bridge this skills gap, Intel started an internship programme with universities more than a decade ago. “We built a great pipeline there and a great relationship with all the universities and the candidates that we get from the universities are just top class,” he said.

“There’s great talent in Ireland and I think it’s recognised in the company around the world.”

Looking to the future, Langan said while he tends to be tunnel-focused on AI models, hardware and software, he did say there’s a big question around what the next architecture is going to be after transformers.

“There are new papers every week. People calling them the transformer killer. It hasn’t happened yet. There’s a new model architecture called Mamba. There’s another model called Hymba, which is modifying that and they’re all looking to accelerate the training, lower power, more performance. So, we’re watching that really, really closely so that we can put something in our hardware.”

Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.

Jenny Darmody is the editor of Silicon Republic

editorial@siliconrepublic.com