Packed Data Support in Haskell

nine_k · 2025-04-28T22:47:47 1745880467

> Introducing the ‘packed’ data format, a binary format that allows using data as it is, without the need for a deserialisation step. A notable perk of this format is that traversals on packed trees is proven to be faster than on ‘unpacked’ trees: as the fields of data structures are inlines, there are no pointer jumps, thus making the most of the L1 cache.

That is, a "memory dump -> zero-copy memory read" of a subgraph of Haskell objects, allowing to pass such trees / subgraphs directly over a network. Slightly reminiscent of Cap'n Proto.

Zolomon · 2025-04-29T03:45:19 1745898319

They mention this in the article.

spockz · 2025-04-29T04:34:06 1745901246

It reminds me more of flat buffers though. Does protobuf also have zero allocation (beyond initial ingestion) and no pointer jumps?

cstrahan · 2025-04-30T17:22:17 1746033737

No, one example of why being variable sized integers.

See https://protobuf.dev/programming-guides/encoding/

carterschonwald · 2025-04-29T13:15:37 1745932537

One thing that sometimes gets tricky in these things is handling Sub term sharing. I wonder how they implemented it.

90s_dev · 2025-04-29T02:19:31 1745893171

We are always reinventing wheels. If we didn't, they'd all still be made of wood.

tlb · 2025-04-29T07:42:58 1745912578

> the serialised version of the data is usually bigger than its in-memory representation

I don’t think this is common. Perhaps for arrays of floats serialized as JSON or something. But I can’t think of a case where binary serialization is bigger. Data types like maps are necessarily larger in memory to support fast lookup and mutability.

IsTom · 2025-04-29T11:48:11 1745927291

If you use a lot of sharing in immutable data it can grow a lot when serializing. A simple pathological example would be a tree that has all left subtrees same as the right ones. It takes O(height) space in memory, but O(2^height) when serialized.

nine_k · 2025-04-29T08:06:58 1745914018

I suppose all self-describing formats, like protobuf, or thrift or, well, JSON are bigger than the efficient machine representation, because they carry the schema in every message, one way or another.

lordleft · 2025-04-29T12:43:01 1745930581

This was very well written. Excellent article!

NetOpWibby · 2025-04-29T14:42:09 1745937729

Is this like MessagePack for Haskell?

gitroom · 2025-04-29T09:13:45 1745918025

honestly i wish more stuff worked this way - fewer hops in memory always makes me happy