Lessons in Managing Haskell Memory

kccqzy · on April 9, 2020

> But by pretending to need the region inside using touch#, after converting the address back, we ensure that the compact region remains live from the GCs point of view.

Wow, but this seems incredibly fragile to me. touch# is a primop that is very magical and, in the face of simplification, prone to being eliminated by the compiler. I remember some bug in which the compiler eliminated the touch# after deducing that the code is dead. Chaos ensues.

If it were me, I'd just keep a random object inside the compact region, and then refer to that object in the regular way (i.e. not via the custom hash table of raw addresses) in order to keep the compact region alive.

In fact, it seems that the main purpose of that complicated ST-like RankNTypes dance is to ensure liveness of the compact region, when objects in the compact region are being referred to using raw addresses. Then why not just keep some regular Haskell value inside the compact region and use that to keep track of regions instead of phantom types? That is, every `Entry` now contains some token that is stored inside the compact region, as well as the address of the data it actually points to. Then, whenever any Entry is reachable in GC, the entire compact region is reachable in GC.

Am I missing something?

Kutta · on April 9, 2020

I agree that storing a reference besides an Entry is probably safer than #touch. It's also similar to arrays slices, where we store reference to the array itself + offsets.

The ST-like region typing can be still useful though, because we get a static bound on the lifetime of a compact region. We'd prefer to avoid leaking space via large compact regions which survive beyond their intended life.

radekchannable · on April 9, 2020

That would remove the need for touch#, but the point of Entry is that it does not contain references to values that the GC has to follow. If we would add a reference to the region, then the GC will have to look into the Entry, which is what we want to avoid. :)

Kutta · on April 9, 2020

I thought the cost of GC following a pointer into a compact region is basically constant time, or is that incorrect?

kccqzy · on April 9, 2020

I don't have all the context you have in your app, but you are saying that during a GC the O(n) action of looking at each live Entry and finding it has a pointer into the compact region is too slow? I find that incredible.

rwmj · on April 9, 2020

I wonder if Haskell has a way to (optionally of course) put objects into manually allocated memory regions? I implemented this in 2006 for OCaml[1]. We used it for (what we would nowadays call) "big data" processing where we had a huge amount of static data for analysis and we didn't want the GC to be constantly traversing it. We could move this data into the manually allocated region we called the "ancient heap". It still appeared as regular OCaml objects, but when the GC saw a pointer that crossed into the ancient heap it would stop following it.

Of course the obvious down side is you have to remember to manually deallocate it :-)

[1] https://opam.ocaml.org/packages/ancient/

arianvanp · on April 9, 2020

This is exactly what the article is describing no? This sounds like the "compact regions" feature the entire article is about.

rwmj · on April 9, 2020

Oh I see - on first reading I thought that compact regions were separately GC'd regions in the old generation. Skimming the paper (http://ezyang.com/papers/ezyang15-cnf.pdf) it does seem as if they independently rediscovered ocaml-ancient.

logicchains · on April 9, 2020

Is it possible to allocate an object directly into a compact region, rather than allocating outside and then copying it in?

rwmj · on April 9, 2020

If you explore ways it might work, I think you'll see that's not easy. For example, let's say you have an "alloc_compact" function. To put a tuple "directly" into the compact heap you might write:

    alloc_compact (1,2)

But wait! You've actually allocated (1,2) and then copied it.

kccqzy · on April 9, 2020

No. Putting an object in a compact region is more than just copying. It's also evaluating the data and making it immutable. Remember Haskell values are lazy, and therefore by default mutable. (Laziness is mutability, and evaluation is replacing the thunk by the normal form of data.)

AnimalMuppet · on April 9, 2020

You're almost trying to do C++ memory management, but do it in Haskell. This seems almost like masochism.

kccqzy · on April 9, 2020

I think that, for many applications that need to deal with a large heap written in a GC'd language, sooner or later people realize the GC just isn't a good fit and reinvent manual memory management. This is the same regardless of whether you choose Haskell or Java or Go. Even in Java world you can find plenty of evidence that people find GC + large heap unsatisfactory and resort to alternatives like memory-mapped files or JNI or just moving the data out of the process. Same thing with Go. I wouldn't call it masochism.

AnimalMuppet · on April 9, 2020

"Any sufficiently complicated program in a garbage-collected language contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of malloc and free"?

zozbot234 · on April 9, 2020

True C++/Rusty memory management would require being able to add+remove GC roots from within GC-compact regions as well. Then any object within a GC-compact region could be made to point safely to any object outside of it by simply promoting that object to a GC root, and undoing that promotion as the reference is dropped. In essence this would completely decouple the two styles of memory management while still allowing them to interact appropriately.