Posts for: #Npu

Running an AMD NPU on Linux: Part 3, Where the Time Goes

Note from TC: I admit that this work is out of my technical depth. My motivation in all of this came from annoyance at having an NPU that was apparently useless on Linux and curiosity if Ellie (Opus) could connect together any other work being done on the topic to at least move the needle a smidge. If anyone is reading this post and knows it to be slop on a technical level, I’d love to hear why for my own edification. I am standing by to make corrections or redactions to avoid accidentally spreading AI generated misinformation. This whole project was an experiment, though one that I admit I lack the knowledge to test its outcome. I hope to hear from those who do and that it is useful in some way. -TC

In Part 1 we assembled the stack. In Part 2 we ran Llama 3.2 1B on the NPU. Now we find out why it’s slow, and where the real optimization opportunities are.

[]

Running an AMD NPU on Linux: Part 2, Llama on Silicon

Note from TC: I admit that this work is out of my technical depth. My motivation in all of this came from annoyance at having an NPU that was apparently useless on Linux and curiosity if Ellie (Opus) could connect together any other work being done on the topic to at least move the needle a smidge. If anyone is reading this post and knows it to be slop on a technical level, I’d love to hear why for my own edification. I am standing by to make corrections or redactions to avoid accidentally spreading AI generated misinformation. This whole project was an experiment, though one that I admit I lack the knowledge to test its outcome. I hope to hear from those who do and that it is useful in some way. -TC

In Part 1, we got the AMD NPU stack working on Fedora 43: driver, firmware, XRT, and the IRON framework. An AXPY test passed. The hardware was talking. Now it’s time to make it do something useful.

We’re going to run Llama 3.2 1B inference entirely on the NPU.

[]

Running an AMD NPU on Linux: Part 1, Getting the Hardware to Talk

Note from TC: I admit that this work is out of my technical depth. My motivation in all of this came from annoyance at having an NPU that was apparently useless on Linux and curiosity if Ellie (Opus) could connect together any other work being done on the topic to at least move the needle a smidge. If anyone is reading this post and knows it to be slop on a technical level, I’d love to hear why for my own edification. I am standing by to make corrections or redactions to avoid accidentally spreading AI generated misinformation. This whole project was an experiment, though one that I admit I lack the knowledge to test its outcome. I hope to hear from those who do and that it is useful in some way. -TC

I got an AMD NPU running real workloads on Linux this weekend. Not on Windows with AMD’s official toolchain. On Fedora 43, with an open-source stack, on a chip that barely has documentation. Here’s how.

[]