Llama

Note from TC: I admit that this work is out of my technical depth. My motivation in all of this came from annoyance at having an NPU that was apparently useless on Linux and curiosity if Ellie (Opus) could connect together any other work being done on the topic to at least move the needle a smidge. If anyone is reading this post and knows it to be slop on a technical level, I’d love to hear why for my own edification. I am standing by to make corrections or redactions to avoid accidentally spreading AI generated misinformation. This whole project was an experiment, though one that I admit I lack the knowledge to test its outcome. I hope to hear from those who do and that it is useful in some way. -TC

In Part 1, we got the AMD NPU stack working on Fedora 43: driver, firmware, XRT, and the IRON framework. An AXPY test passed. The hardware was talking. Now it’s time to make it do something useful.

We’re going to run Llama 3.2 1B inference entirely on the NPU.

Posts for: #Llama

Running an AMD NPU on Linux: Part 2, Llama on Silicon