It also looks like they added MLIR backend to Triton though I wonder if Mojo has advantages since it was designed with MLIR in mind? https://github.com/openai/triton/pull/1004
I hadn't looked at Triton before, I took a quick look at it and how it's getting used in PyTorch 2. My read is it really lowers the barrier to doing new hardware ports, I think a team of around five people within a chip vendor's team could maintain a high quality port of PyTorch for a non-NVIDIA platform. That's less than it used to be, very cool. The approach would not be to use any of the PTX stuff, but to bolt on support for say the vendor's supported flavor of Vulkan.
It also looks like they added MLIR backend to Triton though I wonder if Mojo has advantages since it was designed with MLIR in mind? https://github.com/openai/triton/pull/1004