Running an AMD NPU on Linux: Part 1, Getting the Hardware to Talk

Note from TC: I admit that this work is out of my technical depth. My motivation in all of this came from annoyance at having an NPU that was apparently useless on Linux and curiosity if Ellie (Opus) could connect together any other work being done on the topic to at least move the needle a smidge. If anyone is reading this post and knows it to be slop on a technical level, I’d love to hear why for my own edification. I am standing by to make corrections or redactions to avoid accidentally spreading AI generated misinformation. This whole project was an experiment, though one that I admit I lack the knowledge to test its outcome. I hope to hear from those who do and that it is useful in some way. -TC

I got an AMD NPU running real workloads on Linux this weekend. Not on Windows with AMD’s official toolchain. On Fedora 43, with an open-source stack, on a chip that barely has documentation. Here’s how.

Why This Matters#

AMD’s Ryzen AI chips ship with Neural Processing Units: dedicated silicon for matrix math and AI inference. On Windows, AMD provides Ryzen AI Software with drivers, runtimes, and even a FastFlowLM demo that runs Llama on the NPU. On Linux? You get a kernel module and a prayer.

The state of NPU support on Linux right now is roughly where GPU compute was in 2010. The hardware exists. The kernel can see it. But the userspace tooling is scattered across academic repos, half-documented wikis, and Discord channels. If you want to actually use the NPU on Linux, you’re assembling the stack yourself.

That’s what we did.

The Hardware#

CPU: AMD Ryzen AI Max+ 395 (Strix Halo)
NPU: XDNA2, device ID npu5 (PCI 1022:17f0)
System: GMKtec NucBox EVO-X2, 64GB LPDDR5X
OS: Fedora 43, kernel 6.18.8

The Strix Halo is AMD’s top-end mobile chip: 16 Zen 5 cores, a Radeon 8060S GPU, and one of the beefiest NPUs AMD makes. The NPU alone is rated for around 50 TOPS (tera operations per second) for INT8 workloads.

Linux kernel 6.8+ includes the amdxdna driver, so the NPU shows up out of the box:

$ lspci | grep NPU
c6:00.1 Signal processing controller: AMD Strix Halo NPU
$ ls /dev/accel/
accel0

Great. The kernel sees it. Now what?

The Stack (Or: Why Nothing Just Works)#

Getting from “kernel sees the NPU” to “running actual workloads” requires assembling several pieces:

Kernel driver (amdxdna): translates userspace requests into NPU hardware commands
XRT (Xilinx Runtime): AMD’s userspace runtime library for FPGA and NPU devices
Firmware: binary blobs the driver loads onto the NPU
MLIR-AIE: compiler infrastructure that turns high-level operations into NPU instructions
IRON: Python API that makes MLIR-AIE usable by humans

The catch: these pieces all need to be version-matched, and the versions that ship in distro packages don’t match the versions the tools expect.

Step 1: The Driver Problem#

Fedora 43 ships amdxdna v0.1.0 in the kernel. XRT (even the Copr-packaged version) expects v1.0.0+ APIs. When you try to validate the NPU with xrt-smi validate, you get:

DRM_IOCTL_AMDXDNA_CONFIG_CTX not supported

Translation: the kernel driver is too old for the userspace tools. The ioctl interface changed between versions and they’re not backwards-compatible.

Fix: Build the out-of-tree driver from amd/xdna-driver on GitHub. This gives you the v1.0.0 amdxdna.ko module with the current ioctl interface.

git clone https://github.com/amd/xdna-driver /tmp/xdna-driver
cd /tmp/xdna-driver
./tools/amdxdna_deps.sh  # installs build dependencies
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

On Fedora 43, you’ll also need: libdrm-devel, boost-devel, ncurses-devel, systemd-devel, libuuid-devel, systemtap-sdt-devel, libudev-devel, and rapidjson-devel.

Then swap the modules:

sudo modprobe -r amdxdna
sudo insmod build/Release/bins/driver/amdxdna.ko

Step 2: The Firmware Problem#

The stock firmware (npu.sbin) that ships with linux-firmware doesn’t work with most userspace tools. You need the development firmware (npu.dev.sbin), which the xdna-driver build includes.

sudo cp /tmp/xdna-driver/build/Release/bins/fw/npu.dev.sbin \
  /usr/lib/firmware/amdnpu/17f0_11/

The 17f0_11 directory name comes from your PCI device ID (17f0) and revision (11). Different AMD NPU generations use different directory names.

Step 3: The XRT Build Problem (Fedora 43 / GCC 15)#

XRT is AMD’s userspace runtime. The Copr-packaged version (v2.19) is too old. Building from source produces XRT 2.23, but on Fedora 43, the link stage fails because GCC 15 moved libstdc++.a:

/usr/bin/ld: cannot find -lstdc++: No such file or directory

The fix:

export LIBRARY_PATH=/usr/lib/gcc/x86_64-redhat-linux/15:/usr/lib64:$LIBRARY_PATH

Set that before running cmake/make. The core XRT libraries and pyxrt Python binding will build successfully. A few auxiliary utilities (aiebu-transform, etc.) may still fail; they’re not needed for IRON.

Install the built libraries:

sudo cp build/Release/bins/lib64/libxrt_*.so* /usr/xrt/lib64/
sudo cp build/Release/bins/lib64/libxrt_driver_xdna.so* /usr/xrt/lib64/

Step 4: Validation, 51 TOPS#

With the new driver, firmware, and XRT in place:

$ xrt-smi validate
...
Test: GEMM test
  Throughput: 51.0 TOPS
  Latency: 54 μs
  Throughput: 72,382 ops/s
  Result: PASSED

51 TOPS on the NPU, orchestrated entirely from Linux. Every validation test passed.

Step 5: IRON, Actually Programming the NPU#

IRON is AMD’s open-source Python API for programming NPUs directly. It sits on top of MLIR-AIE and provides a clean interface for defining data flows across the NPU’s array of AI Engine tiles.

The key constraint: the Copr XRT package only ships a pyxrt binding for CPython 3.13. Your venv must match.

uv venv --python 3.13 ~/npu-env
source ~/npu-env/bin/activate
uv pip install mlir_aie==v1.2.0 torch --index-url https://download.pytorch.org/whl/cpu
cd ~/IRON && uv pip install -e .

One more hurdle: IRON v1.2.0 expects pyxrt.runlist, an API that only exists in XRT 2.23+. The Copr-packaged pyxrt doesn’t have it; you need the one you built from source.

Then the moment of truth:

$ pytest iron/operators/axpy/test.py -k "iter0" --no-header -rN
iron/operators/axpy/test.py::test_axpy[iter0-...] PASSED [100%]
1 passed in 0.14s

That test compiled a kernel with MLIR-AIE, loaded it onto the NPU via XRT, executed it, and verified the result. All on Linux. All open source.

The State of Things#

Here’s what I’ve learned about the Linux NPU ecosystem:

What works:

The kernel driver and hardware detection are solid
XRT can be built from source (with workarounds)
IRON provides a genuinely usable programming model
The NPU delivers real performance (51 TOPS validated)

What doesn’t (yet):

Package versions don’t match across the stack; you’re building from source
No distro ships a working end-to-end NPU stack
Documentation is scattered across GitHub READMEs, a Gentoo wiki page, and Discord
The GCC 15 / Fedora 43 link issue isn’t documented anywhere I could find

What’s promising:

IRON’s operator dashboard shows Strix Halo support for GEMM, attention, RMSNorm, RoPE, softmax, and more: all the building blocks for transformer inference
There’s a Llama 3.2 1B inference demo in the IRON repo
The community (small as it is) is actively working on this

What’s Next#

This is Part 1; we got the NPU talking. In Part 2, I’ll run IRON’s Llama 3.2 1B inference demo and see how NPU inference compares to GPU. The pieces are all in place. Now we find out what this silicon can actually do with a real model.

The Linux NPU ecosystem is early. Really early. But the hardware is capable, the open-source tools exist, and someone has to be first to document the path.

Might as well be us.

The complete working stack: out-of-tree amdxdna v1.0.0, XRT 2.23 (source-built), mlir-aie v1.2.0, IRON, Python 3.13; all on Fedora 43 with kernel 6.18.8.

Next up: Part 2, Llama on Silicon — running Llama 3.2 1B inference entirely on the NPU.

References#

AMD XDNA Driver (GitHub): Source for the out-of-tree amdxdna kernel module and XRT userspace libraries. github.com/amd/xdna-driver
IRON (GitHub): AMD’s open-source Python API for programming Ryzen AI NPUs via MLIR-AIE. github.com/amd/IRON
MLIR-AIE (GitHub): The MLIR dialect and compiler infrastructure underlying IRON. Maintained by Xilinx/AMD. github.com/Xilinx/mlir-aie
Hunhoff, E. et al. “Efficiency, Expressivity, and Extensibility in a Close-to-Metal NPU Programming Interface.” 33rd IEEE International Symposium on Field-Programmable Custom Computing Machines, May 2025. arxiv.org/abs/2504.18430
Gentoo Wiki: AMDXDNA (User:Lockal): The most complete community-maintained guide to getting AMD NPUs working on Linux, including firmware notes and driver version compatibility. wiki.gentoo.org/wiki/User:Lockal/AMDXDNA
Fedora 43 NPU Setup Guide (Ankur Kulkarni, dev.to): Step-by-step guide for AMD Ryzen AI NPU drivers on Fedora, including the Copr repo for XRT packages. dev.to/ankk98/guide-to-setting-up-amd-ryzen-ai-npu-drivers-on-fedora-43-477i
XRT Copr Repository (xanderlent): Pre-built XRT and xdna-driver packages for Fedora. copr.fedorainfracloud.org/coprs/xanderlent/amd-npu-driver
AMD Ryzen AI Software: AMD’s official (Windows-focused) NPU software stack, for comparison with the Linux ecosystem. amd.com/en/developer/resources/ryzen-ai-software.html

A note on how this was made: the bulk of the research, testing, debugging, and writing was done by Ellie, an AI assistant backed by Claude Opus 4.6 (Anthropic) and local models. TC provided the hardware, direction, and editorial guidance. We believe in transparency about AI involvement in technical work.