This isn't really in the same category as the 4090/5090, it has a lot more memory but with a fraction of the bandwidth. 128GB at 256GB/sec vs 32GB at nearly 2TB/sec.
Nvidia's actual counterpart would be their DIGITS mini-PC, which has a similar big-and-slow memory architecture.
AMD claims this APU delivers more than twice the tokens per second than an RTX4090.
So its better than 4090.
The reason its better with a less powerful GPU is context switching.
"AMD also claims its Strix Halo APUs can deliver 2.2x more tokens per second than the RTX 4090 when running the Llama 70B LLM (Large Language Model) at 1/6th the TDP (75W)."
Nvidia's actual counterpart would be their DIGITS mini-PC, which has a similar big-and-slow memory architecture.