Hacker News new | past | comments | ask | show | jobs | submit login
Balm in GILead: Fast string construction for CPython extensions (vito.nyc)
50 points by nickelpro on Dec 17, 2023 | hide | past | favorite | 11 comments



An explanation of the title, for those not familiar with it: Gilead was a region situated in modern-day Jordan, and was apparently renowned for its medicinal balm, mentioned a few times in the Bible. Further reading: https://en.wikipedia.org/wiki/Balm_of_Gilead.


Most familiar to me from Edgar Allan Poe's The Raven, where the deranged narrator asks this to the bird:

        “Prophet!” said I, “thing of evil!—prophet still, if bird or devil!—
    Whether Tempter sent, or whether tempest tossed thee here ashore,
        Desolate yet all undaunted, on this desert land enchanted—
        On this home by Horror haunted—tell me truly, I implore—
    Is there—*is* there balm in Gilead?—tell me—tell me, I implore!”
            Quoth the Raven “Nevermore.”


And most familiar to me as the capital of the Barony of New Canaan in Stephen King's The Dark Tower. Albeit no balm in this story (that I recall). https://darktower.fandom.com/wiki/Gilead


Also Handmaid's Tale.


This looks exceptionally interesting and potentially useful to some of our projects. Anyone else using similar methods? Would love to know if there are other caveats not covered by the article before I start refactoring 10K LOC :)


When we first started playing with this we produced memory and reference leaks left and right, mostly through undisciplined access to reference counts or failures to reference count at all. The basic problems the GIL solves don't go away, you need to have a principled approach to managing reference counts for both the Balm'd objects and the underlying data.

A good rule of thumb we came up with is "reference counts are only allowed to be manipulated when holding the GIL." So if you create 500 string views that all reference some underlying data, the reference count for that underlying data should only be incremented by 500 once you acquire the GIL again to introduce the data to Python.

If you try to do granular per-refcount locking, you'll run into the problem in the multi-threading benchmark. You start to incur a lot of lock contention on certain workloads, or even in the best case you have a little overhead for zero gain. Making the reference counts atomic was a catastrophic performance hit.

Related: It's common to choose to free the underlying resources of the balm'd object even though the Python interpreter might have stashed a reference to it somewhere. For example in a webserver, after the request has been served free'ing/re-using the associated header buffers.

We call these "degenerate" applications, because it is not reasonable to hold onto such a reference. Sometimes the best thing is to document you don't support that use-case in the name of performance. Sometimes you might want to be nice and throw a description Python exception.


I think this technique would not work with the limited API: https://docs.python.org/3/c-api/stable.html#limited-c-api

The Limited API has a nice benefit of letting you build artifacts that are compatible with a wide range of Python versions. The technique described in this article tightly couples the extension code to the interpreter internals.


It doesn't work with any version of the public API, Limited, Stable, or Unstable, because this is not a part of the API. It's more of an application of Hyrum's Law[1].

That said, assuming the structures themselves exist on the versions of Python you're targeting in a format compatible with whatever hacking you're doing on them, it's very easy to compile for lots of Python versions using cibuildwheel[2] and the rest of the PyPA ecosystem.

I don't think the Limited API is very useful, as a practical matter for the common distribution methods you need the wheel to be built with the target Python version.

[1]: https://www.hyrumslaw.com/

[2]: https://github.com/pypa/cibuildwheel


> I don't think the Limited API is very useful, as a practical matter for the common distribution methods you need the wheel to be built with the target Python version.

The limited API provides ABI stability, which allows building a single artifact that will work across Python versions.

We use this for the Protobuf PyPI package. A single artifact like protobuf-4.25.1-cp37-abi3-manylinux2014_x86_64.whl can be used with any Python version >= 3.7.


Ya, but so what? It's zero engineering effort to build the wheels for all the compatible versions. You don't win anything with the limited API.


You suggested it wasn't possible (or practical), so it seemed worth clearing up that it is indeed possible.

It is nice to cut the number of distributed artifacts by a factor of four. Also, depending on third-party repositories full of build infrastructure increases your risk of supply chain attacks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: