I switched our webapp deployment from using sdists from an internal pypi instance (yay cheeseshop!) to using wheels built on the server and the time to install some things such as lxml, or numpy, went from 20 minutes-ish to 20-30 seconds. Python wheels are fantastic.
How are you hosting the wheels internally? Are you still using an internal pypi instance but with wheels instead of sdists?
I've been looking at doing something similar in our environment but there's so many options I haven't figured out what the best and most straightforward way might be.
Shameless plug: I had the exact same question at my last gig, and wrote up a quick open-source tool to build wheels and dump them to an S3 bucket. [0]
Usage is as simple as
mkwheelhouse mybucket.mycorp.co scipy numpy
which will automatically build and upload those wheels for your current architecture and dump them into mybucket.mycorp.co. It builds a pip-compatible index, too, so you can just tell pip to search your wheelhouse first before falling back to PyPI:
If you need to build for several OSes, you can run mkwheelhouse against the same bucket from each OS.
The downside of this approach is you can't host private packages, because you need to enable public website hosting. (Although, VPC endpoints might have changed this!) But the simplicity of this approach plus the massive speedup of not needing to constantly recompile scipy was totally worth it.
More plugging unashamedly: I wrote a guide for what is probably the quickest and simplest way to get started with a private pypi instance - just Apache with directory autoindex, and wheels that are manually-uploaded to the server:
Your easiest bet is probably devpi, but you can download the cheeseshop code that powers pypi as well.
I'm using cheeseshop, but some people swear by warehouse, which is supposedly a legacy-free version for running pypi eventually.
If you don't care about the search api, you can also just enable an directory listing index page and use any web server. Pip will do the right thing when given the right incantation of magical arguments and you make a prayer to the pip gods.
Use PyPICloud or DevPI to host your built binaries. Wheels have another advantage that you don't install any dev tools in your production environments (no need for compiler tools). The only annoying thing is that there isn't any way to distinguish between different Linux distributions, so hopefully all your servers are standardized.
As a (now casual) sysadmin, I think things have gotten a bit better. Virtualenv (now part of python3) helps a lot. The trick, if running on Debian, is:
1) use system python -- if you can. Now with python3.4 and 2.7 in stable, that's the easy bit.
2) If you can't do 1), use backports and/or manually backport[1]
3) If you need python-packages that depend on c-libraries, use "apt-get build-dep python-foo".
4) use a virtualenv, or several virtualenvs, eg:
virtualenv -p python2.7 myvenv
./myvenv/bin/pip list -o #list outdated packages,
# typically that'll mean you first need to:
./myvenv/bin/pip install -U pip #possibly also setuptools
Then install the packages you need with pip -- see 3) for tip on stuff like pyqt that needs some (fairly hairy) packages.
This works for most everything -- except perhaps for pygame, which is exceptionally gnarly.
[1] Manually backport, the easy way:
sudo -s
echo "deb-src http://httpredir.debian.org/debian\
sid main contrib non-free" \
>> /etc/apt/sources.list.d/sid-src.list
apt-get update
cd /tmp
apt-get build-dep python3
exit # No need to build as root
apt-get source python3 # this assumes sid has a version
# that'll work for you
cd python3*
dpkg-buildpackage -uc -us
cd ..
sudo dpkg -i python...deb
Note, this may or may not work with major packages like python -- but it often goes a long way. If you can't get enough build-deps satisfied, things are more complicated -- and you're probably better off running testing/unstable in a chroot, possible with help of the schroot package -- rather than fighting with conflicting packages.
The Python ecosystem is so good about portability, but flexibility always comes with a cost.
I'm assuming your systems engineer perspective is colored by RedHat (and others) packaging python-ldap at the system level and then leaving you to your own devices for everything else.
C bindings are really convenient, but then you have to worry about dynamic linking. Are you sure Python is linked to the same OpenLDAP library that NSS uses? Hence, system versions of some Python packages, but only where other system level dependencies are an issue.
"Absolute disaster" seems strong. The situation doesn't seem too bad to me. I can pip install things and it works. I keep hearing that Python's packaging is so bad, but I don't even know what the problems are.
Again as a casual user, I don't want to state anything too strongly, because some of it could be user error... but a few of the troubles I've had just trying to use and update Python in the last couple of years include packages from PyPI being broken and refusing to install; this including pip so pip was not able to update itself. Packages with dependencies that won't install from PyPI (or installation fails on my machine), sending you looking for tarballs. No clear route to upgrade between semi-major versions of Python. Things that install but need the config frobbing before they work - I'm not afraid of doing that but it's always a hassle.
I think some of the problems were self-inflicted by trying to use Python on a Mac with no package manager, and on Windows. Things will get easier on the Linux side with Python 3 in repositories now. But it really seems like a fairly common pattern for languages / programming environments is to install the languages package manager and upgrade packages separately from the OS package manager. For whatever reason this seems like a bigger headache with Python than others.
But seriously, mirror your production environment for local development and testing. This is all comically easy with VMs, you just need to alter your workflow.
Actually, I was very pleasantly surprised at how well pip+python3.4+visual studio (w/correct paths set up so that pip finds the compiler) worked.
Granted this was recently, under windows 8.1 (tried it mostly for fun, and so that I have a decent calculator (ipython) on hand when I boot up in windows).
Actually, distutils is looking for Visual Studio first in registry, then by env vars. So it should find VS without setting up anything. You just need to have correct version istalled (the banner in repl will tell you that, for example, for ActiveState Python 2.7 = VS2008, 3.x = VS2010).
I was also pleasantly surprised, building python C extensions under Windows is easy. Building some of the C or Fortran dependencies, on the other hand, might be not.
Right - remembered something about python giving me very nice error messages (probably over wrong version or 32bit vs 64bit) - along with pretty clear help on how to fix things. Just didn't remember exactly what small task (hit "y"/download+install - or set up env vars) I had to do to get everything to work. I do remember it was (for windows and c compilers) very easy.
I confess, I've only dabbled with venvs on windows (had to play along with the wonderful django tutorial created by/for djangogirls[1]). But my secret to working with pip on windows have been invoking it from inside python: install python (optionally set up a venv); then:
import pip
pip.main("install ipython".split())
of course, python for windows drops pip in the scrips-folder -- but doing everything from inside python is magically cross-platform :)
Pip will drop ipython in the scripts-folder too -- so that where one can grab a shortcut. Or just (after restarting python):
import IPython
IPython.start_ipython() # Ed: not .start()
The disaster part is not the tooling, but indeed the nobody understanding how "X" solves the problem and actually choosing an "X" and stick with it.
There is a clear lack of _direct_ and _concise_ documentation about Python packaging. Something between overly descriptive reference documentation which nobody reads and "copy-paste this setup.py" that teaches nothing of substance.
Let me use the OP as an example:
"Wheels are the new standard of python distribution and are intended to replace eggs."
Ok, I've done my share of Python coding and have published stuff on PyPI and this sentence means absolutely nothing to me. I'm not even sure I can define what an egg is if someone asks me. It probably means even less for someone trying to make it's own package.
One important thing to remember is that packaging is a means to an end. Nobody gets excited about packaging so nobody wants to know too much, but also not too little that thinking about doing it instills feelings of dread.
PS: This applies to other packaging scenarios too. I've seen people fall to their knees when faced with building Debian packages for deployment (ending up not doing it at all), RPMs, etc.
Python packaging seems pretty painful compared to what I've experienced with, say, Clojure (list dependencies in project.clj, done), but that seems like an unfair comparison. It's really common for Python projects to bind to C libraries for heavy lifting, while this is pretty uncommon in Java-land. Is there another language that makes heavy use of the C-lib-binding pattern which has a much more pleasant packaging experience than Python? Does Ruby, for example?
Ruby does not (IMNHO) have a more pleasant packaging experience, no (although, the inverse may be true - I think writing a proper gem -- and listing/uploading it -- is probably easier than a proper python package, and pushing that to pypi).
Python's virtualenv is a lot more stable than the various hoops one goes through with eg rvm. On the other hand, if you needed/wanted to use more than python2.7/python3.4 -- it might be a bit harder to juggle many python versions than it is to juggle many ruby versions with rvm.
Probably the main difference is that you rarely really need "many" versions of python -- and hopefully one won't really need "many" versions of ruby (as much) any more either.
So the problem bundler/venv solves becomes a solution to 95% of the entire problem, rather than just 50%.
We stopped developing web projects on Windows and switched to Linux (Ubuntu) Workstation. We are one with the Universe again. Seriously, run a VM on a reasonably powerful Windows machine and you can have both worlds and it all works very well. My favorite setup is a three monitor rig with Ubuntu on a VM running on my main monitor and other stuff running on the other monitors under the Windows host OS. There are a lot of advantages to doing this, including also running something like a Xen hypervisor and simulating an entire (small) data center infrastructure.
If by packaging you mean deployment, yeah, it's a mess. Only in the sense that you can't just FTP you application to any old server like you do with PHP projects.
It'd be nice if Python for the web "just worked" but it really isn't that difficult to get things setup (Nginx, Gunicorn, PostgreSQL, etc.). Also, there are a lot of reason not to want a canned PHP-type setup.
We have a policy to never bring-up servers by hand. You capture your setup and configuration in Ansible scripts, document them well and things just work. Then use Selenium to confirm that all is well with a basic test application. Life is good.
I agree to a certain extent. The JVM/Maven pom appoach seems to work the most reliably across architectures, which is nice.
Python is a friggin' nightmare by comparison. Some trivial low-complexity operations can take insane amounts of elbow grease due to missing/outdated libs.
To this day I have not found a good way to distribute compiled JNI binaries. You can either consider them completely separate from your Java code and package them separately, or you can put them into your JAR and then do some horrible hacks of unpacking the JAR at runtime, detecting the architecture and loading the library from the temporary directory. This is really something that should be built into the standard package formats/tools.
I'm in the likely unique position of being a serious Python user that has almost no experience with virtualenv, pip, setuptools, or any python packaging. I have no control over the hosts I have to support, which means I can never count on packages to be installed. I almost always use pure python 2.5-2.7. I'm also a Windows user.
From that perspective, I can see the OP's point. For most of Windows history, if you needed a library installed, you'd install it. Every installation was accomplished through the familiar Windows installer. If you needed a newer version, you installed it. There was process, and one target.
Now take modern Python. A new layer has been introduced, where you may run every project in a new target, or through the 'default' install. In addition, you have libraries and packages that may be installed through one of several different installers or processes, most of which are different than the OS's package management, and which aren't necessarily compatible and aren't tightly managed. This is on top of multiple python versions that may have to be installed.
I can see where he's coming from.
That being said I love Python and I respect the work that has been done to allow such a vibrant user base.
You're a serious Python user that has no experience with any of the last decade's tools for packaging, and you find things hard without using any of those tools, and THAT'S why you can see where he's coming from?
It is only a partial solution. You need the pip and virtualenv packages along with a private PyPI repository. Binary wheels make it so that you don't have to clutter any environments except for build servers with compilers and other build tools. There's still the problem that Linux wheels don't differentiate on distribution, so you will hit problems if you build on one distribution and try to deploy on a very different one.
How do I upgrade all packages with Pip? Why are there multiple ways to install packages, e.g. eggs, setup.py, pip, easy_install (just off the top of my head)? How does pip manage to break dependencies? Why does no standard way to manage packages come pre-installed? How do I pin packages at a version? Why is there a need for virtual-env to exist as a separate project?
Compare this to how languages like Go, Rust, Ruby, and Julia handle packages and dependencies and Python is an absolute disaster. Even if there are answers to the above questions, as a fairly advanced user I have no idea what they are, and I have done plenty of research.
I don't know how to upgrade all packages, but I that's not something I want to do because I want to control which packages I upgrade. To upgrade a single package you can do
pip install --upgrade packagename
> Why are there multiple ways to install packages, e.g. eggs, setup.py, pip, easy_install (just off the top of my head)?
Egg is a package format. setup.py is a build script. pip and easy_install are package management tools. You use setup.py to build eggs that you install with easy_install or pip. You can also install directly with setup.py, but that's not something you'd generally do. pip is a better, more recent installation tool than easy_install.
> How does pip manage to break dependencies?
I'm not sure what you mean here.
> Why does no standard way to manage packages come pre-installed?
I guess the answer is because no one had bothered solving this issue until recently. Starting with Python 3.4, pip is included by default. See https://docs.python.org/3/installing/index.html
> How do I pin packages at a version?
You list your packages with their version numbers in a requirement file that you can pass as an argument to pip. You can use pip freeze to get a list of currently installed packages with their pinned version numbers that you can include into your requirements file.
> Why is there a need for virtual-env to exist as a separate project?
No need for that, it just hasn't been integrated in the standard distribution until fairly recently. Starting from Python 3.3, venv is included: https://docs.python.org/3/library/venv.html
> Compare this to how languages like Go, Rust, Ruby, and Julia handle packages and dependencies and Python is an absolute disaster.
Absolute disaster is a bit strong, but it's admittedly not as good as the other languages you mentioned. I think every Python developer who knows other languages will agree. That doesn't stop us from getting our job done though and the situation is improving.
I don't personally care about Python packaging (and virtualenv) anymore when I can use any relevant Python package with Nix, or easily make a Nix package myself. I just hope the wheels don't make it harder to package stuff with Nix. With eggs I think there's something like an --old-and-unmanagable flag that makes it still output a directory structure.
For deploying software to end-users (rather than developers), bundling all required Python code with appropriate wrapping to adjust PYTHONPATH works just fine.
I want to address several of the comments. I used to work for continuum, but I no longer do. I didn't work on the conda team, but as a part of the other teams at continuum, and at my new gig I'm a heavy conda user
1. linux package management is sufficient - this isn't the case. linux package management couples your library versions with your distro verison. Meaning if I want to do something like try out my code with the latest pandas, or run 2 jobs with different versions of numpy I'm out of luck
2. conda went ahead and forked everything without regard for the community - this is completely untrue. We sat down with guido for the very first pydata at google offices in 2012 to discuss many issues, packaging being one of them. Guido acknowledged that the python packaging ecosystem isn't sufficient for the scientific community, and we should go out and solve the problem ourselves - so we did. Honestly on this point, you should just read travis' words about the issue
Scientific users need to be able to package non-python libs along with python libs (think zmq, or libhdf5, or R) we need a pkg mgmt solution that sits OUTSIDE of python for this reason. You can think of conda as a little in between place between virtualenv and docker
Wheels and Conda are almost completely orthogonal. Conda is a higher level abstraction than pip. Wheels is operating at the lowest level of abstraction.
Multiple package management tools are relatively fine. Having multiple competing format standards (egg vs wheel vs lots of other options) would be a disaster.
(For reference, there's .deb versus .rpm, but also dpkg versus apt-get versus yum versus up2date versus...)
> Having been involved in the python world for so long, we are all aware of pip, easy_install, and virtualenv, but these tools did not meet all of our specific requirements. The main problem is that they are focused around Python, neglecting non-Python library dependencies, such as HDF5, MKL, LLVM, etc., which do not have a setup.py in their source code and also do not install files into Python’s site-packages directory.
I haven't used Conda, but I totally get their main point here. Python packages sometimes depend upon non-Python packages. Any Python-specific packaging solution that cannot express that e.g. a Python package depends upon a native library does not really solve this problem.
I don't regard packaging, dependency management, and isolation as programming-language-dependent problems.
Conda certainly sounds ambitious -- but I'm doubtful how useful it really is. As I see it, it mainly fills two needs: the lack of a package manager on OS X and the lack of a package manager on Windows.
Windows is getting a package manager now, and OS X arguably have the App Store.
Everywhere else (read: Linux/BSD) -- most of what's packaged by Conda, is already packaged by the vendor.
In short: I don't think it's the best way forward (but it may very well be the best way for many usecases/users right now and in the near future).
A big part of Conda's usefulness is not system-wide package management a la apt or yum or whatever, but being able to have reproducible, isolated environments for scientific computing or other times when reproducibility is very important, and/or when you need multiple versions of packages for compatibility with other packages.
I believe it uses virtualenv for this, but I may be wrong. Regardless, having this feature (as well as no interference with the system Python) is one of the killer features for me, and others I'm sure.
This is a great point. Still, whole system snapshots in the form of a vm disk image, vagrant etc might be a better way to serve that use-case. Perhapssome form of appkiance levering something like guix or coreOS?
There's probably also some element of "it isn't the best general solution, but it's super convenient for some particular group of people, e.g. python users".
An additional benefit of Conda is that building or installing packages that require a C/Fortran compiler can be done without using sudo. This is really nice when you're working on a cluster or other shared computer resources, and don't want to have to bother the sysadmin.
+1 that conda solves "most of our packaging problems" in the science/data science world (I've lived within it for 1.5 years, it really simplifies things). I use conda on Linux, Mac and Windows for r&d and deployment for tools including numpy, sklearn, matplotlib along with flask, pymongo etc.
I'm sorry if things aren't working well for you. The Python community already did pick a standard, which is PyPI, pip and wheels. All this is covered in PEPs. Unfortunately, the creators of Conda present their own tool to compete with the standard with a complete fork of all of Python's packaging, right down to virtualenv, which really only works for the packages that they wanted to support. That isn't the fault of the broader Python community. There is nothing anyone else can do to prevent that.
There are at least two major classes of people using Python.
On the one hand, there are people who write, distribute and deploy Unix-y-type software -- system utilities, web applications, service daemons, etc. These users write for users who are like themselves, or who are sysadmins of Unix-y systems. Their concern is with automatable, repeatable, command-line-friendly packaging, distribution and deployment. They benefit from wheels because it saves them a potentially-time-consuming build step during installation, and will generally prefer wheels or tarballs as a distribution format.
On the other hand, there are people who use Python as a scientific computing and number-crunching stack. Many of them do not write and distribute their own applications; they distribute the results of their code. Many of them run on desktop-type machines using Windows. What they primarily want is a single .exe they can download and run which will provide Python plus all the numeric/scientific libraries in one install, plus (increasingly) IPython Notebook as a way to share their work.
There is no single standard which works well for both of these groups.
> There is no single standard which works well for both of these groups.
Sure there is. A single exe that bootstraps python and leverage wheels to install what the scientific user needs.
In fact, with so many new tools comming to windows lately (as in the past decade) -- that exe could probably be a powershell-script. And an equivalent shell-script could probably do the same for more unix-y platforms.
Or one might make a script that builds an installer that bundles up things downloaded via the wheels framework.
The distributions that I've used, unpack themselves to look just like a mainstream installation, including pip. So, if you have to install a missing package, you still do so in the same way. The distributions just give you a head start.
Specifically, there aren’t conda packages for every Python package. And if there are, they are behind their pip alternatives in many cases. Sometimes you then need to fall back to pip. However, it is impossible to install some packages into conda environments via pip. So basically the whole solution falls apart because the python libs you install via pip can't find C libraries installed via conda.
The packaged distributions are definitely a blessing, when introducing beginners to Python. Even when package installation goes swimmingly, it still takes effort. And it's handy to keep just one big installer on a flash drive for putting Python on non-networked computers. Since I use Python for lab work, I often have it running on several computers at once.
Why not both (at least on OSX and Windows)? Build packages where the users are. Even in the scientific space, many of your users build packages using pip.
This means the Python community can work towards standard tools and interfaces for working with Python packages rather than trying to support many different ways of installing / distributing packages.
My experience with wheels has been that it continues to be an absolute nightmare to deal with Linux distros and we end up building from the tarballed source as usual.
Then again, we tend to struggle a bit with Python package management in general (who doesn't, I guess).
> Avoids arbitrary code execution for installation. (Avoids setup.py)
This isn't much of an advantage as you're trusting code compiled on an untrusted machine. I suppose it saves you some CPU cycles, but don't be fooled into thinking it's safer.
There's a somewhat compelling use case for `pip install foo` not trusting the packages, but the actual Python code that runs "import foo" trusting it. In particular, if I'm a sysadmin of a shared machine and a user asks me to install something systemwide, I don't want arbitrary code running as root, but if the installed code is malicious, that's up to the unprivileged user who chooses to use it to keep their own data safe.
It's not terribly compelling given stuff like virtualenv, though, since users can just `pip install` stuff on their own and the experience is strictly nicer. And it's also not compelling if there isn't an explicit promise that this is treated as a security boundary.
> In particular, if I'm a sysadmin of a shared machine and a user asks me to install something systemwide, I don't want arbitrary code running as root, but if the installed code is malicious, that's up to the unprivileged user who chooses to use it to keep their own data safe.
Ah, that makes a lot of sense, especially for applications (eg meld) and command line utilities (eg httpie) which are nice to install systemwide as root, but only ever run in an unprivileged user's context.
It is also a problem for packages that do use setup.py. My apsw extension has several options including fetching some other prequisites, and choosing which extensions to include, but you can't give options to pip to pass along.
Assuming you trust the package source already, I think the advantage lies more in not having the setup process accidentally erase your home directory for whatever reason.
Wheels solved one major issue that few people talk about: packages that needed compiling on 64-bit Windows (which was a mess, bordering the impossible).
I don't get all the hate towards Python packaging. For any package that isn't ancient (or extremely complex like NumPy or SciPy) it just works to "pip install". And nowadays pip will even cache builds locally as a wheel so installing something a second time takes next to no time.
Also, learn to use:
pip install --user <package_name>
Disclaimer: I use Python every day, mostly on OS X and RHEL.
I have often seen it touted that OS X is "real Unix." To me, that would imply the easy availability of working compilers and compile environments. Is that not true?
There's easy availability, but it requires installing the Apple Developer Tools, which can either be done at install time (by not by default) or by downloading them from apple. So even though python is installed by default a C compiler might not be, and if a beginner is trying to use python for the first time they might be daunted by having downloading several gigabytes of developer tooling to install one python package.
Re: Windows, Appveyor provides a CI service for windows. It has been used to test and build python packages, and produce wheel archives for windows. E.g. sklearn uses this. I have used it myself to set up a CI build for a simple python utility and found it straight forward to get working.
The biggest problem I've had with Python packages is that you can't use multiple version of the same package in a Python project without a lot of complex tricks. This becomes more and more a problem in large code bases over time. One dependency might need requests 1.x and another 2.x in incompatible ways.
Unfortunately Wheels don't solve this problem as it seem inherent to how python imports modules.
I've seen people get around this by vendoring their dependencies and I've done some tricks in the past where you provide a vendor proxy package that you can point at a different site packages but this is brittle if your dependency has additional dependencies or uses absolute imports. Now you're maintaining a fork.
I'd love to hear if anyone has had more success in this realm. I love Python but using node and rust, and having the ability to have isolated dependencies trees has made me sad about the state of python packages.
To balance this, you might also solicit feedback about the problems that are caused by Node's use of multiple versions of the same package in a project.
Anyone else have issues with pillow/PIL and wheels? I have to force that package to not install from cached wheels, because the first build is fine, but subsequent installs from the cached wheel seem to break.
http://lucumr.pocoo.org/2014/1/27/python-on-wheels/
I switched our webapp deployment from using sdists from an internal pypi instance (yay cheeseshop!) to using wheels built on the server and the time to install some things such as lxml, or numpy, went from 20 minutes-ish to 20-30 seconds. Python wheels are fantastic.