Why does multi-tenant QoS *require* hardware-supported virtualisation? An operat...

Why does multi-tenant QoS require hardware-supported virtualisation?

An operating system doesn't require virtualisation to manage application resource usage of CPU time, system memory, disk storage, etc – although the details differ from OS to OS, most operating systems have quota and/or prioritisation mechanisms for these – why not for the GPU too?

There is no reason in principle why you can't do that for the GPU too. In fact, there have been a series of Linux cgroup patches going back several years now, to add GPU quotas to Linux cgroups, so you can setup per-app quotas on GPU time and GPU memory – https://lwn.net/ml/cgroups/20231024160727.282960-1-tvrtko.ur... is the most recent I could find (from 6-7 months back), but there were earlier iterations broader in scope, e.g. https://lwn.net/ml/cgroups/20210126214626.16260-1-brian.welt... (from 3+ years ago). For whatever reason none of these have yet been merged to the mainline Linux kernel, but I expect it is going to happen eventually (especially with all the current focus on GPUs for AI applications). Once you have cgroups support for GPUs, why couldn't a paravirtualised GPU driver on a Linux host use that to provide GPU resource management?

And I don't see why it has to wait for GPU cgroups to be upstreamed in the Linux kernel – if all you care about is VMs and not any non-virtualised apps on the same hardware, why couldn't the hypervisor implement the same logic inside a paravirtualised GPU driver?