Self-loading code is a huge part of the value-add of python libraries. Many of the popular libraries (e.g. Numpy and friends) trigger a bewildering chain of events to compile from source if not installing from pre-built wheels. And if you do have wheels, you have opaque binary blobs. So pick your poison: compile-on-install with possible backdoor or prebuilt .so/.dylib/.pyc with possible backdoor.
The most obvious (but not necessarily easiest) approach is to phase out setup.py and move everything to the declarative pyproject.toml approach. This is not just better for metadata (setup scripts make it really hard to statically infer what deps a lib has), it also allows for better control over what installers/toolchains run on install.
Attackers still have quite a lot of latitude during the build phase, but at least libraries have the option to specify declaratively what permissions they need (and presumably the user has the option to forbid them).
Also eval/exec are terrible and I wish there were a mode to disable their usage, but I don't know if the python runtime has some deep dependency on it. Maybe there's a way to restrict it so that only low level frames can call the eval opcode.
Would it be possible that the wheels could be built in a more-trusted / hardened environment? Having a binary blob isn't as serious when it comes from a trusted source. Almost all Debian/etc linux distributions have this feature (binary-downloading package manager).
The hardening could mitigate on-compilation hacking.
Obviously, this leaves "compile in the backdoor and wait for the user to fall into it" but at least this isn't an issue of compiling on the user's computer and it isn't a issue of binary blobs. And possibly there's a greater chance of detection if actual source code has to be available to compile.
>Also eval/exec are terrible and I wish there were a mode to disable their usage,
You can use audit hooks in the sys module (as long as you load it first) to disable eval/exec/process spawning or even arbitrary imports or network requests.
The most obvious (but not necessarily easiest) approach is to phase out setup.py and move everything to the declarative pyproject.toml approach. This is not just better for metadata (setup scripts make it really hard to statically infer what deps a lib has), it also allows for better control over what installers/toolchains run on install.
Attackers still have quite a lot of latitude during the build phase, but at least libraries have the option to specify declaratively what permissions they need (and presumably the user has the option to forbid them).
Also eval/exec are terrible and I wish there were a mode to disable their usage, but I don't know if the python runtime has some deep dependency on it. Maybe there's a way to restrict it so that only low level frames can call the eval opcode.