Depends on the PCIe/DMA topology of the system, but in short, in an ideal system you can avoid the bottleneck of the CPU interconnect (eg, AMD's Infinity Fabric) and reduce overall CPU load by (un)loading data directly from your NVMe storage to your PCIe accelerator [0]. You can also combine this with RDMA/RoCE (provided everything in the chain supports it) to make a clustered network with NVMeoF to serve data from a high speed NVMe flash array(s) to clusters of GPU's; even potentially using this to reduce cost/space/power by reducing the nead for high cost/high power CPU's. Prior to CXL's proliferation (which realistically we haven't achieved yet), this is mostly limited to bespoke HPC systems; most consumer systems lack the PCIe lanes/topology to really make use of this in a practical way.
On the consumer side, you're right, using the System ram is probably a better approach as most consumer motherboards would have the NVMe storage routing up to the CPU Interconnect then back "Down" to the GPU (or worse through the "southbridge" chipset(s) like on X570) so you take that hit anyway.
However if you have a PCIe switch on board that allows data to flow direct from storage to GPU without a round trip across the CPU, then NVMe/CXL/SCM modules would theoretically be better than system RAM. Depends on the switch, retimers, muxing, topology etc.
Regardless of what you're using for direct storage and how ideal your topology is, the MTps and GBps over PCIe is significantly slower than onboard VRAM (be it GDDR or especially HBM) and bandwidth limited to boot. Doesn't mean it's useless by any means, but important to point out that this doesn't turn a 20GB VRAM card into a 2.02TB VRAM card just because you DirectStorage'd a 2TB Drive to it, no matter how ideal the setup is. However, as PCIe increases in bandwidth and Storage-Class-Memory type devices and just storage tech in general continues to improve, it's rapidly becoming more viable. On PCIe gen 3, you're probably shooting yourself in the foot. on PCIe Gen 6, you can realistically see a very real benefit. But again, there's a lot of "depends" here and for now, you're probably better off buying a bigger or multiple GPUs if you're not on the cutting edge with the corporate credit line.
On the consumer side, you're right, using the System ram is probably a better approach as most consumer motherboards would have the NVMe storage routing up to the CPU Interconnect then back "Down" to the GPU (or worse through the "southbridge" chipset(s) like on X570) so you take that hit anyway.
However if you have a PCIe switch on board that allows data to flow direct from storage to GPU without a round trip across the CPU, then NVMe/CXL/SCM modules would theoretically be better than system RAM. Depends on the switch, retimers, muxing, topology etc.
Regardless of what you're using for direct storage and how ideal your topology is, the MTps and GBps over PCIe is significantly slower than onboard VRAM (be it GDDR or especially HBM) and bandwidth limited to boot. Doesn't mean it's useless by any means, but important to point out that this doesn't turn a 20GB VRAM card into a 2.02TB VRAM card just because you DirectStorage'd a 2TB Drive to it, no matter how ideal the setup is. However, as PCIe increases in bandwidth and Storage-Class-Memory type devices and just storage tech in general continues to improve, it's rapidly becoming more viable. On PCIe gen 3, you're probably shooting yourself in the foot. on PCIe Gen 6, you can realistically see a very real benefit. But again, there's a lot of "depends" here and for now, you're probably better off buying a bigger or multiple GPUs if you're not on the cutting edge with the corporate credit line.
0: https://developer.nvidia.com/blog/gpudirect-storage/