I'm really glad this is being taken seriously. It's often been an uphill battle to convince people that Python supply chain security is a serious issue.
In the ML (now called AI) space for example, it's not uncommon to download random binaries from the internet containing model weights, scripts, etc. Sometimes even at runtime (!!!) Lots of bad practices across the industry there that wouldn't be tolerated in other contexts.
IME, half the instructions got tested just once prior to publication, on the developer's laptop or if you're lucky AWS instance and that's it. No pinned versions of anything, so if you come across something you want to try (or rather, if it gets pushed to your desk by someone higher-up accompanied by a "we need AI now!!!" note) you'll first have to spend hours upon hours trying to pin down Python package versions just to get the Python part to install, then you gotta mess with CUDA versions because obviously these haven't been documented either... Nasty shit.
While we're on that topic, what is people's strategy for playing with Stable Diffusion safely?
I still haven't found any way to run it in a VM using consumer hardware (GPU continues to refuse to work). A second install of the OS on a second drive is so insanely clunky to switch between, I'd really like to not have to keep doing that.
I've not actually done it, but from what I understand using two GPUs is the way to go - you use one for your actual display etc., and the other is just passthrough to the VM.
(I was looking into it in the context of running Fusion 360 in a Windows VM though, not Stable Diffusion or any ML.)
Because it installs like 100,000 python scripts of mystery origin that run with full privileges. Even if the maintainers are unlikely to be malicious on purpose, it only takes one person accidentally putting a typo in a dependencies file in one of the hundreds of packages it imports... many of which not commonly used ones.
It's better than nothing but is it enough to run potentially malicious code?
I haven't checked recently but a while ago most distros defaulted to letting anyone peep into other users' home dirs. Moreover there has been so many exploits over the years letting a user gain root privileges that, for the purpose of security, unix users are akin to a bathroom lock.
supply chain security is super important in a lot of areas but seemingly quite a hard problem to broach/solve. hopefully awareness continues to be raised/more tooling/etc
I haven't listened to the interview but are they going to add namespaces? That's the only good solution I can see to the current unfixed dependency confusion issue.
By that I mean, you want to use a private pip repo in your company, you upload `yourcompany_secretproject` to it and tell people to install it. Now the only way to prevent yourself being hacked is to publicly register an empty package `yourcompany_secretproject` on pypi.org. Oh and also hope the admins don't notice it and remove it because it's empty (which they have said they will).
This is a great interview! Mike (and Seth, who is tasked with addressing the non-PyPI security needs of the Python ecosystem) have been doing a great job both documenting and expanding the Python ecosystem’s security capabilities and outstanding needs.
PyPI’s security features have undergone a significant expansion since the backend rewrite back in 2017; I think it’s accurate to say that, since then, it has consistently been on the forefront (amongst its peer indices) in terms of adding scopeable API tokens, MFA, secret scanning, and most recently trusted publishing).
(FD: The company I work for helped add some of those features[1][2].)
Thank you so much for including a transcript!!! I hate when audio or video content doesn't, I personally prefer to read rather than listen but there are plenty of users with disabilities who don't even have an option.
In the ML (now called AI) space for example, it's not uncommon to download random binaries from the internet containing model weights, scripts, etc. Sometimes even at runtime (!!!) Lots of bad practices across the industry there that wouldn't be tolerated in other contexts.