I don’t think you really understand the current trends in computer architecture. Even cpus are being moved to have on package ram for higher bandwidth. Everything is the opposite of what you said.
Higher bandwidth but lower capacity. The real trend is different physical architectures for different compute loads. There is a place in AI for bulk albeit slower memory such as extremely large date sets that want to run internally on a discreet card without involving pci lanes.
This is also not true. You can transfer from main memory to cards plenty fast enough that it is not a bottleneck. Consumer GPU's don't even use pcie5 yet, which doubles the bandwidth of 4. Professional datacenter cards don't use pcie AT ALL, but they do put a huge amount of RAM on the package with the GPUs.