Why do you highly recommend this? You can set git (or other VCS) settings to avo...

shockinglytrue · on May 31, 2020

Difficulty with git basics is the least of your worries. For example pycs are fundamentally racy, it is quite possible to have a .py newer than a .pyc depending on how unlucky you were with an in-progress deployment, or some tool that never updated the .py timestamp. Python continues to execute the pyc even though the code changed, since the minimal benefit of the pyc would be rendered significantly inert should Python use any kind of strong check (instead of just comparing second-granularity stat() output) to ensure the cached bytecode matches the source. In this way, without your permission, Python silently plays undesirable code execution roulette with your computer every time it starts, for as long as you have the feature enabled.

I have lost count of the number of times I've seen someone lose an hour due to it. I can also count many instances of QA environments becoming inexplicably bricked by it. The correct fix for this requires opening the .py and hashing its content, at least doubling the amount of IO required to start a program. They were a great feature when parsing small files was noticeably slow, but this hasn't been true for almost 20 years.

It's therefore worth turning the question around: why do you think pyc files are useful?

eesmith · on May 31, 2020

Python 3.7 added support for hash-based cache files as an alternative to time-based.

https://docs.python.org/3/reference/import.html#pyc-invalida...

Verifying a hash is a bit slower than checking the timestamp but far faster than parsing and byte compiling the source file, so I don't think this option is "significantly inert".

shockinglytrue · on June 1, 2020

Managed to miss this, thanks. I'd be interested to hunt out the BPO ticket at some stage to see if they benchmarked on NFS or spinning rust

eesmith · on June 1, 2020

Huh. I hadn't bothered to read the PEP, which is https://www.python.org/dev/peps/pep-0552/ on "Deterministic pycs."

> The current Python pyc format is the marshaled code object of the module prefixed by a magic number [7], the source timestamp, and the source file size. The presence of a source timestamp means that a pyc is not a deterministic function of the input file’s contents—it also depends on volatile metadata, the mtime of the source. Thus, pycs are a barrier to proper reproducibility.

That is, they were made for a quite different use case than you or I were talking about.

I looked at the PEP to see if it gave timing numbers. No luck - would be a good blog post if I were still blogging. It does say:

> The hash-based pyc format can impose the cost of reading and hashing every source file, which is more expensive than simply checking timestamps. Thus, for now, we expect it to be used mainly by distributors and power use cases.

_gok2 · on May 31, 2020

In the last 12 years of writing python, I have only hit had issues with .pyc files a handful of times, and always with python < 2.7. Anaecdotally this experience is shared with everyone I have worked with.

If you’re seeing this regularly, it suggests there may be something unique or uncommon in your set-up. You may wish to isolate and change whatever that is.

neurostimulant · on May 31, 2020

Now that you mentioned it, I just realized I never have problems related to .pyc files anymore ever since I switched to python 3 a few years ago. I remember I used to have problem with deleting database migration files because python would load the .pyc files of deleted migration scripts unless I also delete the .pyc files (which I often forgot).

shockinglytrue · on June 1, 2020

The Django development server's asynchronous auto reloader is neither unique nor uncommon.

BiteCode_dev · on May 31, 2020

When doing dev, there a be other problems with pyc files. On of them is deleting the original py file, but still being able to import the module because Python will import any pyc file it if exist.

So if performances allow you to do so, disabling them when you dev is not a bad idea, as long as you keep them enabled when you run CI.

IMO, when you dev, you should set PYTHONHASHSEED (random is predictible), PYTHONDEVMODE (verbose warnings + tooling + sys.flags.dev_mode=True) and PYTHONDONTWRITEBYTECODE (remove the need for cleaning them).

In CI and prod, you should make sure those are NOT set, and use PYTHONOPTIMIZE=2 (remove assert, __debug__ lines and docstring) if you trust your dependancies to be well written (which I would check by running tests in a pre-push hook).