A Theory of Composing Protocols (2023)

haskellandchill · on April 6, 2024

I’m interested in how this compares to the coalgebra approach like in “Session Coalgebras: A Coalgebraic View on Session Types and Communication Protocols” (https://arxiv.org/pdf/2011.05712.pdf).

rdtsc · on April 6, 2024

> We demonstrate our approach in the practical setting of Erlang, with a tool implementing protocol composition that both generates Erlang code from a protocol and generates a protocol from Erlang code.

That's great. Erlang has gen_statem as a built-in behavior so it seems appropriate to use it. Also Joe Armstrong had been talking about protocols for the longest time. He would say something like "Tell me what's on the wire! Don't give me some C++ API".

[1] https://www.erlang.org/doc/design_principles/statem

convolvatron · on April 6, 2024

I struggle to put it into words - but I think this modern notion that we exchange language structs between processes and not arbitrary messages has lost something valuable.

edit - I guess true interoperability? agency?

naasking · on April 6, 2024

What would it even mean to accept an arbitrary message? All you can do in that case is record and/or forward because the contents are unknown.

convolvatron · on April 6, 2024

sorry, message defined as a format, like a tcp header that's written down and agreed upon - not message defined in some other software system like protobuf with some mapping to and from language structures

discreteevent · on April 6, 2024

Protobufs are just structured messages. They don't have anything to with language structures per se. They are semantically the same as the TCP header that's written down and agreed upon.

convolvatron · on April 6, 2024

yes. but as a matter of culture one doesn't write binary formats any longer. that's just - silly? and its not as if I don't think people should use abstractions to format messages.

and as a matter of definition, yes, protobufs are just bit strings and a schema definition in a limited type language (just like xdr and asn before it).

but again, as a matter of usage, we're only really supposed to use them to map to and from structs.

and its not as if this doesn't work, clearly it does. as someone who used to do protocol implementation before all this, I find unsettling. and I don't know if I'm just being predudicial, or if we we are somehow losing something by adopting this model as essentially mandatory.

I do find it frustrating, that when I do want certain binary properties, or I want to encode a cyclic structure in a particular way, or use a richer schema. that I have to fight upstream to not use protobufs even if they don't bring anything except a useless wrapper around a byte string I am defining anyways.

aleksiy123 · on April 6, 2024

I'm not sure I understand this.

Isn't protobuf just one such protocol definition. With a focus on being general.

If you don't like the trade-offs it makes then you can use your own specific protocol with your own tradeoffs.

But yes people will push back on you if you can't justify why that decision is better than the general solution especially if they have to use it.

Imo starting with an out of the box solution as a baseline and then moving to a more specialized solution when the need arises is pretty solid decision.

convolvatron · on April 7, 2024

I think you answered my question. I’d like to say ‘what’s the most effective way to serialize this’ and everyone else wants to say ‘use protobuf unless you can come up with a really good reason not to’

aleksiy123 · on April 7, 2024

Yes makes sense. I think it's also about the context of what your goal is and what your eng skill set is.

If you are low-level C shop that may make sense. But if you are trying to build some rpc service for usage between multiple teams and languages. Using protobufs or something similar is gonna give you more mileage.

sly010 · on April 7, 2024

I feel you. Protobuf is the organizational equivalent of adding more lanes to congested roads or mandating more parking spaces. It leads to more cars.

convolvatron · on April 7, 2024

reading this didn't make sense to me...but I think you mean 'we have so much machinery here and its hard to get a handle on it - I know! lets fix it by adding more machinery'

xg15 · on April 6, 2024

This sounds pretty cool. I think there is some kind of gap in formal modeling for languages that are interleaved with each other. Like, it's more or less straight-forward to describe, say, javascript and HTML(5) as ASTs, but things get hairy when you embed one language into the other, e.g. HTML attributes that contain JavaScript or JavaScript strings that contain HTML. Maybe this could help?

w10-1 · on April 7, 2024

For some historical antecedents...

Kent Beck and Erich Gamma in discussing JUnit in early 2000's proposed "pattern density" as a measure of a good object-oriented design.

But pattern density was criticized as embrittling. Because the same component was participating in multiple patterns, the component became impossible to change.

For perhaps these reasons, they removed the Composite and Command patterns from JUnit 4.

hyperthesis · on April 7, 2024

What exactly is a "protocol"?

I skimmed the pdf for a definition, but it seems to only give examples, then jump directly to defining compositions.

nathanrf · on April 7, 2024

The paper defines them as programs in a process calculus (which is fairly standard as far as theory for protocols is involved):

  Definition 1 (Asserted protocols) Asserted protocols, or just protocols for short, are
  ranged over by S and are defined as the following syntax rules:
  S ::=
      |p.S                action prefix
      | +{ l_i : Si }_i∈I branching
      | µt.S              fixed-point
      | t                 recursive variable
      | end               end
      | assert(n).S       assert (produce)
      | require(n).S      require
      | consume(n).S      consume

Process calculi are "fundamental" descriptions of computation analogous to lambda calculus but oriented around communication instead of function calls. (As far as paper structure, I find that usually the important "basic" definitions in programming language research papers are usually in Section 2, since Section 1 serves as a high-level overview).

Basically, a protocol consists of a a sequence of sends/received on a particular channel, mixed with some explicit logic and loops/branches until you reach the end. There's some examples in Section 2.1 which are too complicated to reproduce here.

As a general note on reading protocols- for (good, but industry-programmer-unfriendly) technical reasons they're defined and written as "action1.action2.action3.rest_of_program" but mentally you can just rewrite this into

  {
    action1();
    action2();
    action3();
    ... rest_of_program ...
  }

(in particular, making "the rest of the program" part of each statement makes specifying scope much easier and clearer, which is why they don't just use semicolons in the first place)

hyperthesis · on April 7, 2024

Thanks for your guidance! I now see they're (now-obviously!) things like TCP and http. I had missed their informal definition:

> Here we use the term protocol to denote a specification of the interaction patterns between different system components. [...] To give a more concrete intuition, an informal specification of a protocol for an e-banking system may be as follows: The banking server repeatedly offers a menu with three options: (1) request a banking statement, which is sent back by the server, (2) request a payment, after which the client will send payment data, or (3) terminate the session.

I would say it's like how you use an API.