You can have a code using a GPU array library and just differentiate it (which ends up being more flexible / composable).
- parses the Rust AST
- folds it into a CUDA C AST
- writes it to a temporary .cu file
- compiles it with nvcc
- links it into your Rust binary
and that allows you to launch your function as a CUDA kernel.
The Rust emu crate does this (more or less), but targets WebGPU instead of CUDA.