Distributed Systems and the End of the API

ChuckMcM · on May 13, 2014

Fun stuff, amusing that the definition of a distributed system used; "Where a computer that you never heard of can bring your system down." is actually one of Leslie Lamport's more famous quotes.

When I joined Sun in '86 I thought it was the pinnacle of technological excellence to be a kernel programmer, and I joined the Systems Group, the notional center of the Sun universe, in 1987. However I discovered that the primary reason you had to be picky about kernel programmers what that their bogus pointer references crashed the machine (as they occurred in kernel mode with full privileges) but discovered that network programmers could crash the whole world with their bugs. So clearly they must be in a pantheon above kernel programmers. :-)

The author has come to discover that in the network world things can die anywhere, and this makes reasoning about such systems very complicated. Having been a part of the RPC and CORBA evolution I keenly felt the challenges of making APIs that "looked" like function calls to a programmer but took place across a network fabric and thus introduced error conditions that couldn't exist in locally called routines. (like the inability to return from the function due to a network partition for a simple example).

Lamport's work in this space is brilliant and inspired. Network systems can be analysed and reasoned about as physical systems when they exhibit discontinuities when considered as simple algorithms. The value here is to realize that a large number of physical systems tolerate a tremendous amount of randomness and continue to work as intended (windmills for example) while many algorithms only work consistently given a set of key invariants.

I gave a talk that was inspired by Dr. Lamports work titled 'Java as Newtonian Physics' which was a call to action to create a set of invariants, in the spirit of physical laws, that would govern the behavior and capabilities of distributed systems. It was way early for its time (AOL dialup connections were still a thing) but much of the same inspiration (presumably from Lamport) made it into the Google Spanner project.

As with many things, at a surface level many people learn an API which does something under the covers across the network but having come up through their education thinking of everything as an API they don't fundamentally grasp the notion of distributed computation. Then at some point in their experience there will be that 'ah ha' moment when suddenly everything they know is wrong, which really means they suddenly see a bigger picture of things. It makes distributed systems questions in interviews an excellent litmus test for understanding where people are in their journey.

jacquesm · on May 13, 2014

I've never seen an RPC system that I really liked. The closest to a model of distributed computing that gets me from 'a' to 'b' without going terminally insane is anything based on message passing. Even though there is significant overhead I figure that by the time you go distributed and your target of the RPC call or message lives on the other side of a barrier with unknown latency that overhead is probably low compared to the penalties that you'll be hit with anyway.

So then the trick becomes to make sure that a message contains a payload that is 'worth it'.

Making the assumption that any message may not make it to its destination and that confirmations may be lost (akin to your return example) is still challenging but I find it easier to reason about than in the RPC analogy.

I love that Lamport quote :)

A nasty side effect of all this network business is that what looks like a function call can activate an immense cascade of work behind the scenes, gethostbyname (ok, getaddrinfo) is a nice example of such a function. On the surface it's a pretty easily understood affair but by the time you're done and you get your results back you've likely triggered millions of cycles on 'machines that you've never heard of'.

arethuza · on May 13, 2014

"I've never seen an RPC system that I really liked."

I must admit I've never seen a message passing system that I really liked either :-) Mind you that's possibly because of times making stuff work in environments where someone made the decision "you shall use message passing for all inter-system communication" even when it wasn't always the best option.

These days my practical test for a remote API is whether I can stand using it through cURL - if I can happily do stuff from the command line then the chances are that code to do stuff won't be too insane.

jacquesm · on May 13, 2014

I liked QnX, currently playing around with Erlang. (Erlang has tons of warts but it gets enough of the moving parts just right that I find it interesting).

fenollp · on May 13, 2014

One does not often hear about the warts of Erlang. What do you name those?

gritzko · on May 13, 2014

Recently I was talking with a guy doing CRDT research. His past background was something CPU design related. I always considered a CPU a Newton/Turing ideal machine. I was surprised to know that it feels more like a distributed system. Due to high frequencies, events that happen in one part of CPU are unknown to other parts for quite a while, i.e. so many ticks later that they have to act semi-independently.

cemerick · on May 13, 2014

Hi, author here. :-) Thank you for the fun anecdote and kind words.

Hopefully we can collectively get better at addressing these problems.

ChuckMcM · on May 13, 2014

Absolutely there is more fun to be had. I clearly remember that sort of "ah ha" moment when I figured out that data structures could be computation. That took me from a loop that could not operate fast enough on the data, to one where the data set had some precomputation done on it and the loop only had to 'finish' it for various conditions and was plenty fast. Suddenly large vistas of "wow" open up. The posting from Julia's blog about how computers are really fast, same sort of experience for her. Suddenly a new understanding, the world shifts, and now you have a whole bunch of new insight to throw at problems. We can't help but get better at addressing problems.

I believe it was Leslie but it might have been Butler Lampson who mentioned you could stomp on a bunch of ants and the colony still worked fine. Ants are a great example of a durable distributed system that is robust in the face of massive amounts of damage. When you start thinking about computers like that it makes you realize you can build 100% uptime systems after all. The implementation of that property (individual machines are junk, collectively they are unstoppable) was done really well inside Google's infrastructure. They got to watch it in action when a colo facility they had clusters in caught fire.

programminggeek · on May 12, 2014

At least the author is wise enough to see that REST/RPC are not so different from each other.

I actually find it interesting that as I was learning about earlier networked "objects" type systems, programmers ran into problems where they were treating the networked objects as if they were local and that the network always works. Now, when we build REST api's they always ship with client libraries that feel like local objects and completely abstract away most notions of network failure, etc.

I'm not saying we've made an unreasonable tradeoff, it's just interesting that we seem to be making more refined versions of the same solutions with the same fundamental problems.

I guess the author was making a similar point.

steveklabnik · on May 12, 2014

"Layman's REST" is very much RPC, yes.

Fielding's REST is very much not.

mantrax5 · on May 12, 2014

Fielding's REST is pretty much CRUD in HTTP disguise.

Don't get me wrong, this can be great for "hypermedia applications" as Fielding's paper argues. But "hypermedia applications" just doesn't fit what many distributed services do these days.

Services are naturally centered arounds verbs (commands and queries) and not nouns (resources), so like with any other CRUD system, at some point a REST API that shoehorns everything into the four standard verbs HTTP commonly gives us, no longer adequately describes the business requirements of your app. You can definitely force things to be RESTful, but it's typically not the natural way to build an API. Feels akin to the ORM kind of impedance mismatch in some ways.

steveklabnik · on May 13, 2014

I agree that many services are simply CRUD wrappers. That doesn't have much to do with the nature of the architecture Fielding proposes.

I would be interested in some citations from Fielding which demonstrate that RPC is its organizational principle. I don't think they're there, though.

beamatronic · on May 13, 2014

I'm surprised someone hasn't embraced the idea and built the ultimate generic CRUD wrapper

steveklabnik · on May 13, 2014

They often fail. See ActiveResource, for example.

mantrax6 · on May 13, 2014

Fieldman's work doesn't exist in vacuum. He talks about HTTP, and HTTP has the verbs it has. It's hard enough to find consistent behavior in HTTP servers and proxies with "PUT" and "DELETE" let alone anything else. But even ignoring that, the verbs the spec talks about are limited. And that's a big problem.

As for RPC, essentially any communication between machines is a RPC. It's message passing ("call") and if the message arrives on the other end it's processed by a message handler ("procedure").

No server can just reach into another server's RAM and get or modify a resource directly. The interaction happens entirely by the will of the message receiver, and in exactly the way the receiver wants (not the sender).

So if we'll be building everything on an appropriate abstraction, it better match what really happens (messages, message handlers) and not some wishful thinking abstraction layered on top (resources, resource modification).

RPC is not REST's problem. The problem is the limited commands (PUT, POST, DELETE) and the single possible query type (GET) that we need to work with.

When a payment gateway has to represent a simple "process payment" command with a series of bogus abstractions like a "POST /payment/transaction/new", you know REST is the wrong tool for the job.

steveklabnik · on May 13, 2014

> He talks about HTTP, and HTTP has the verbs it has.

Sure, but REST is not HTTP. Yes, it was created to describe the architecture of HTTP, but you can do REST over other protocols, and even HTTP is not truly RESTful. Indeed, Fielding's thesis devotes an entire section to this: http://www.ics.uci.edu/~fielding/pubs/dissertation/evaluatio...

> and proxies with "PUT" and "DELETE"

This is due to the HTML spec, not anything else. In fact, this was almost changed in HTML5, but the problems with adding arbitrary verbs to HTML.FORM weren't adequately addressed. You could make that case, if you felt you'd solved the problems.

> As for RPC, essentially any communication between machines is a RPC

This characterization of RPC means that _everything_ is RPC, which means it's not a very useful way to compare architectures.

> The problem is the limited commands (PUT, POST, DELETE) and the single possible query type (GET) that we need to work with.

HTTP != REST. That said, something very similar to this is called the 'uniform interface constraint,' and is definitional to REST.

By the way, HTTP can have extension verbs too, so you're not actually limited to that.

Anyway, you still haven't shown me how Fielding's REST is RPC, or at least, in the way that software architects talk about RPC. http://www.ics.uci.edu/~fielding/pubs/dissertation/evaluatio... may be instructive.

mantrax5 · on May 13, 2014

Regarding RPC, you're right, I guess I'm not quite following the classical understanding of it. I tend to work with actor systems, where everything (in-process or intra-process) is message passing, so I tend to see the message layer first, and anything else merely a protocol implemented on top of it. Hence my response.

As for HTTP, if HTTP itself isn't truly RESTful, the only protocol I've ever seen REST implemented on, I'll have to resign from this debate, because it becomes completely abstract.

I hope the right protocol comes along to show us what REST might really be.

I'm spotting a bit of a pattern in REST supporters, where if some understanding of REST is found to have faults, it's called "not truly RESTful". Reminds me of how Agile failures tended to be explained as "not truly Agile"... but that's another story.

I judge a system by the implementations, not by the theory. If REST is this beautiful unattainable ideal that almost no one can truly implement, I tend to think the fault doesn't lie with the implementers.

It's quite likely REST simply has very few applications itself, but the applications have the potential to become all-encompassing (like the web itself). But in this case the issue remains REST is not suitable for 99% of the services people expose on-line, as each of those services has a specific scope and application, where REST's "network effects" of uniform resource interface and so on don't apply.

nandemo · on May 13, 2014

See CoAP (Constrained Application Protocol).

https://datatracker.ietf.org/doc/draft-ietf-core-coap/

ademarre · on May 13, 2014

The problem with REST is that many (I dare say most) who think they are applying it really aren't.

I think this is quite relevant: http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hyperte...

What sets REST apart from RPC is the Hypermedia-as-the-Engine-of-Application-State principle (HATEOAS). With HATEOAS, interaction semantics are removed from URIs and defined in terms of link relations. This decouples clients from URIs; very valuable.

ADD: To equate REST with CRUD overlooks HATEOAS completely. Simple CRUD solutions work on resources that have already been identified, but HATEOAS adds resource identification/addressing and discovery in a very maintainable way.

steveklabnik · on May 13, 2014

While I agree with you that that's certainly an interesting part of the split between RPC and REST, you might be curious to know that Fielding doesn't think so.

> What makes HTTP significantly different from RPC is that the requests are directed to resources using a generic interface with standard semantics that can be interpreted by intermediaries almost as well as by the machines that originate services.

http://www.ics.uci.edu/~fielding/pubs/dissertation/evaluatio...

ademarre · on May 13, 2014

Interesting, thank you; but in that quote isn't Fielding actually comparing RPC and HTTP, not REST?

steveklabnik · on May 13, 2014

Yes, you're right, I was being a bit sloppy there. The Uniform Interface (and Layered System) is one of the ways in which HTTP does follow RESTful principles, though, so the thrust is still the same.

ademarre · on May 13, 2014

Agreed. But I think it's a stretch to conclude that Fielding doesn't think the hypermedia principle is an important distinction between RPC and REST:

From his blog post I linked to earlier, titled REST APIs must be hypertext-driven [0]:

> "I am getting frustrated by the number of people calling any HTTP-based interface a REST API. Today’s example is the SocialSite REST API. That is RPC. It screams RPC. There is so much coupling on display that it should be given an X rating.

"What needs to be done to make the REST architectural style clear on the notion that hypertext is a constraint?"

[0] http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hyperte...

superdude264 · on May 13, 2014

I don't understand how it's possible to program to a true REST service given his final bullet point and one of his comments:

> A REST API should be entered with no prior knowledge beyond the initial URI (bookmark) and set of standardized media types that are appropriate for the intended audience (i.e., expected to be understood by any client that might use the API). From that point on, all application state transitions must be driven by client selection of server-provided choices that are present in the received representations or implied by the user’s manipulation of those representations. The transitions may be determined (or limited by) the client’s knowledge of media types and resource communication mechanisms, both of which may be improved on-the-fly (e.g., code-on-demand). [Failure here implies that out-of-band information is driving interaction instead of hypertext.]

> When I say hypertext, I mean the simultaneous presentation of information and controls such that the information becomes the affordance through which the user (or automaton) obtains choices and selects actions. [...] Hypertext does not need to be HTML on a browser. Machines can follow links when they understand the data format and relationship types.

If you're only supposed to start with an initial URI and there is no out-of-band communication, how is an automaton supposed to know what links to follow to reach the desired information? A person can do this by reading, reasoning about what they read, then following the appropriate link. Writing an automaton that can reason about anything that can fall under the initial URI seems equivalent to writing some kind of artificial intelligence.

prawks · on May 13, 2014

The key is:

> [a] set of standardized media types that are appropriate for the intended audience (i.e., expected to be understood by any client that might use the API)

There is out-of-band "communication" in that the client and server agree on these media types and how to use them.

The following article does the best job I've seen at illustrating a truly RESTful service:

http://www.infoq.com/articles/webber-rest-workflow

The <next> node in the response to the order POST is one such example.

superdude264 · on May 14, 2014

Great article. If I'm understanding it correctly, it seems like the advantage comes from mapping your ___domain actions to the standard HTTP actions on each ___domain object. Once this is done, your API is simply the media types and the interaction with your service can be done through well-understood HTTP actions rather than learning a bunch of method calls as in an RPC approach. Correct?

pfraze · on May 13, 2014

There are common standards for links. I think you can do reasonably well by supporting 2 or 3, though I'd prefer the world moved to the Link header. The others are html (<link>, <a>) json-hal, jsonld, etc. Regarding the process:

Links include metadata and types (the "rel" attribute). You can think of the full set of given links as an index. To search this index, you scan for reltype first (since that carries semantics and guarantees behavior) then check the other metadata attrs as needed (eg type=text/css). The only knowledge you embed then is published under the reltype specs.

bodhi · on May 13, 2014

So, honest question. I've been musing over REST and RPC for a while, and was trying to come up with some domains that were verb-oriented instead of noun-oriented. The only thing I could come up with was message passing, a-la XMPP or streaming content.

What are some other problem domains that are better represented in verb-oriented terminology?

programminggeek · on May 13, 2014

I think when an action affects multiple nouns at the same time, it starts to look more like a verb oriented approach. For example, say you have some kind of fundraising event donation system where you can have a donor donate to people or teams of people participating at the event as well as the event itself or the organization as a whole. So, something like a 5k run to save the whales.

When you create a donation that impacts potentially an auth record and a user account (if they create one), a donation, as well a it could end up touching the event, participant, team, or organization itself. Now, do you make all those data changes as a series of nouns with actions that you must do in some particular workflow and order? Or, do you have a higher level verb that you throw input at and it returns a result, doing whatever it needs to do in the process? Or do you have a noun with a single verb that starts to feel a lot like RPC?

In some ways they are equivalent, you have input, state change, and output. In other ways they are different. I would expect in REST you might have to hit multiple nouns to change all the state appropriately and I'm not sure there is a good mechanism to enforce that it happens correctly according to the desired workflow.

In the end, for things beyond CRUD and reporting, I think complex actions that touch multiple verbs is often better represented as a verb. However, I would love to see concrete examples of how you model complex workflow based state changes as a "transaction" or workflow in REST or if REST api's inherently opt out of such behavior.

dragonwriter · on May 13, 2014

> When you create a donation that impacts potentially an auth record and a user account (if they create one), a donation, as well a it could end up touching the event, participant, team, or organization itself. Now, do you make all those data changes as a series of nouns with actions that you must do in some particular workflow and order? Or, do you have a higher level verb that you throw input at and it returns a result, doing whatever it needs to do in the process? Or do you have a noun with a single verb that starts to feel a lot like RPC?

Well, you have a noun "donation". Whether it has multiple or single verbs depends on what operations make sense against that noun. Though it seems like that it would support multiple verbs.

There is no guarantee in REST that an operation on one resource has no effect on other resources. In fact, its quite common for operations to effect other resources.

> I would expect in REST you might have to hit multiple nouns to change all the state appropriately and I'm not sure there is a good mechanism to enforce that it happens correctly according to the desired workflow.

The method is that in this case, the "donation" resources would support appropriate verbs, and the other nouns which could not be changed independently which might be changed as a result of actions on the donation resource would not need verbs to support those changes, but would change as a result of the course of implementing the actions on the donation resource.

People seem to think of REST in terms of mapping to RDBMSs with resources as analogous records in base tables in a normalized schema against which you conduct CRUD operations. But, if you have to think of REST in RDBMS terms, its a better analogy to think of the resources exposed by an API as records in a set of potentially updatable, potentially denormalized, and potentially overlapping views rather than base tables.

steveklabnik · on May 13, 2014

> I would expect in REST you might have to hit multiple nouns to change all the state appropriately

In layman's REST, yes. In Fielding's REST, no. It's one of the serious ways in which layman's REST is deficient.

icedchai · on May 13, 2014

Exactly this. I've seen people doing crazy, non-nonsensical stuff to make their APIs "RESTful". So RESTful that they make no sense.

dreamfactory2 · on May 13, 2014

> Services are naturally centered arounds verbs (commands and queries) and not nouns (resources)

Can you qualify that? Queries are presumably queries on resources, and which commands did you have in mind apart from creating or updating resources?

cfallin · on May 13, 2014

Not OP, but a thought anyway: these worldviews are sort of duals of each other, IMHO. You can consider a "noun" to be one instance of a verb's application, i.e., an action. For example, "send an email" or "purchase an item", both verbs, translate to creating new nouns representing (respectively) an email in my Outbox or a transaction. The advantage of the noun worldview is that the nouns can describe the history of verbs and the resulting application state, e.g., I can look at and manage the whole ledger of transactions, whereas verbs are just ad-hoc manipulations of that state.

grey-area · on May 13, 2014

I think the reason people limit verbs (or nouns) usually comes down to attempting to limit the gestalt which others coming new to the system have to hold in their head - if I know that you serve n resources, each of which has 4 well understood verbs, it's much easier to reason about than if I must know the verbs which go with each object and what they do to that particular object in your world.

Of course the real world and real systems will never conform to this sort of system, and you have to break out of it occasionally, but sometimes it's a good starting point as long as you're don't let it limit the horizons of your world, such that only 4 verbs should be enough for anyone, or verbs become completely subsidiary to nouns and must be escorted by them at all times. Zealotry based on this perfectly reasonable idea (limiting complexity to promote understanding) often leads to a Kingdom of Nouns situation:

http://steve-yegge.blogspot.co.uk/2006/03/execution-in-kingd...

scotth · on May 13, 2014

Aside: Why do you have so many accounts mantrax? At least one of them is dead, and I'm sure you can imagine why.

scotth · on May 13, 2014

HN hellbans, so you might not realize you're not being heard.

I'm seeing [dead] markers from matrax6 and mantrax7 (although oddly enough, not consistently).

Can't comment on the quality of the code. Your problems sound atypical.

mantrax6 · on May 13, 2014

None of them are dead. As for why, long story short, because HackerNews' code is really poorly written. Gateway errors, horrendous response times, bogus limits and false positives. My usage patterns are likely not typical, but this is hardly an excuse for such a poor UX.

For example, I couldn't even respond right now through mantrax5...

Next time I'll just take the alternative and instead of making more accounts, I'll just stop visiting the site completely.

ademarre · on May 13, 2014

Have you reported these HN bugs?

jwingy · on May 12, 2014

CALM and CRDTs are interesting stuff.

That being said, I feel like the author is confusing a bit the specific implementations of modern APIs vs the concept of an API which I see as simply some (somewhat standardized) interface to a system which you don't own. Those seem like two different problem domains to me, but perhaps I'm arguing over a different definition of APIs than from what the author is talking about....

cemerick · on May 13, 2014

Hi, author here.

(Somewhat standardized) interfaces are _fine_. My contention is that you can have an interface shared by disparate actors without the problematical bits of "APIs" (both in spirit and in their particular current best materializations), which provide no useful data model constraints, do not acknowledge the realities of the network, and inherently couple client and server.

The point is that you can have a shared "interface" over _data_, in exactly the same way as producers and consumers share shapes/types of messages routed via queues — except that there are ways (CRDTs being one) to extend that dynamic so that data can be replicated along any topology, and shared and reacted to by N actors, not just a consumer downstream of your producer.

I hope that clarifies. :-)

mantrax5 · on May 12, 2014

I think you got it exactly right. It feels like the author got a little too excited about CRDT-s and forgot all the other principles of good system design (it's about clear stable interface, low coupling, single responsibility and so on).

cmeiklejohn · on May 12, 2014

I also covered a variety of similar issues when discussing that offline rich-web applications are perfect for CRDTs, because you are effectively building a distributed system, in my EmberConf 2014 talk [1][2] called Convergent/Divergent.

[1] http://confreaks.com/videos/3311-emberconf2014-convergent-di...

[2] https://speakerdeck.com/cmeiklejohn/divergent

* edited to reformat list.

cmeiklejohn · on May 12, 2014

Also, very relevant:

"A Note on Distributed Computing" 1994, Sun Microsystems Technical Report

* http://lambda-the-ultimate.org/node/1450

* http://dl.acm.org/citation.cfm?id=974938

iadapter · on May 12, 2014

Of course APIs that serve as synchronous endpoints to distributed systems are a leaky abstraction. But its not the only one of its kind, there's also Guaranteed Message Delivery [1].

I find the philosophy behind Akka in this context a better fit - embrace that networks are unreliable and build your app around this limitation accordingly [2]. The cost is that it results in more work for the developer just like with the usage of CRDTs.

[1] http://www.infoq.com/articles/no-reliable-messaging

[2] http://doc.akka.io/docs/akka/2.1.0/general/message-delivery-...

baldeagle · on May 12, 2014

TL;DR: APIs have issues with concurrency and latency, amongst others. Use Consistency As Logical Monotonicity (CALM) or Conflict-free Replicated Data Types (CRDTs) instead. Here is a little about how CRDTs work. btw: speech in NY on the 15th of May.

lstamour · on May 12, 2014

Thanks for the summary. Via Google, ended up at http://www.slideshare.net/jboner/the-road-to-akka-cluster-an... which I've bookmarked to watch at a future date, since I'm new to these concepts.

dgreensp · on May 13, 2014

As the author of EtherPad I'm familiar with CRDT, which is a cousin of OT. They don't really replace APIs, unless you are using an API to synchronize data, which is only one of many things you might be trying to do.

In other words, if you're building EtherPad or Wave, use a fancy data structure for the collaborative document. Otherwise, don't. Meteor's DDP provides a nice model, where the results of RPCs stream in asynchronously.

cemerick · on May 13, 2014

Hi, author here. I'm not sure you read the whole piece. :-) (Modern) APIs are a very limited mechanism of state transfer that happens to be paired with often side-effecting operations. Thus, a "synchronization" (I don't think that word is particularly useful because reasons) mechanism paired with reactive computational services _does_ replace APIs, and offers the ability to do much, much more.

OTs (operational transforms) _are_ a related precursor to CRDTs only in that they are both ways to reconcile concurrent changes, but that's really the limit of the connection. Unfortunately, the substrate for OTs (text, integer-indexed sequences of characters) is fundamentally not amenable to commutative operations. This makes implementing OTs _very_ difficult and error-prone, and certain combinations of concurrent operations are completely unreconcilable (a result that came out of a Korean group's study, can't find the cite for it right now).

jorangreef · on May 13, 2014

I think the paper you are referencing might be [1]?

It's one of my favorite papers on CRDTs and provides practical pseudocode for learning how to implement CRDTs yourself.

The structures they present are simple to understand and have good performance characteristics compared to similar CRDTs [2].

A key insight from the second paper is to write CRDTs that optimize for applying remote operations over applying local operations, as the ratio of remote operations to local operations will be greater. i.e. 100 clients making 1 change to a CRDT will require all 100 clients to each apply 99 remote operations and 1 local operation.

[1] Replicated abstract data types: Building blocks for collaborative applications - http://dl.acm.org/citation.cfm?id=1931272

[2] Evaluating CRDTs for Real-time Document Editing - http://hal.archives-ouvertes.fr/docs/00/62/95/03/PDF/doce63-...

cemerick · on May 13, 2014

The cite I'm missing at the moment is a multi-year study that catalogued all known operational transforms over text (there were many more than I imagined prior), along with proofs showing that certain combinations of concurrent operations simply could not be reconciled consistently.

Thanks for the other pointers, though!

dgreensp · on May 13, 2014

There's actually an interesting deeper connection between OT and CRDT, in which OT comes across as a special case of CRDT.

Suppose your state is a text document or array of characters (we could also examine other kinds of state like an unordered set of objects with properties, but it's less interesting). CRDT assigns a semi-permanent name to each unique data element (character), which is typically a string that indexes into a tree. It's permanent unless the names get too long, in which case you rebalance the tree. The papers I've read treat the rebalancing as an offline operation, to be done one day at 3am when no one is using the system, but in principle you could do it online, as long as you save enough information to rewrite the names in any operations you receive that were meant for the old tree to apply to the new tree. OT is equivalent to rebalancing the tree after every operation. You don't actually need a tree, then, and the names are just numbers (in the case of an array). Names are scoped to a revision, and operations are always rewritten to use the appropriate names before applying them.

Another maintenance operation you might do on a CRDT tree is to remove "garbage" (deleted elements, which you keep around so that you can perform insertion operations relative to them). OT always delete garbage immediately, and operations that refer to a deleted element are rewritten (when they are transformed against the operation that deleted the element).

I'm not saying one is better than the other. People seem to have an easier time wrapping their heads around CRDT, but maybe just because OT hasn't been explained well. The CRDT tree and name strings sounds like kind of a pain to implement versus OT's arrays, but I've only implemented OT and not CRDT.

Saying that APIs are a "mechanism of state transfer" is as overbroad as saying function calls are a mechanism of state transfer. The article at first seems to provide itself an out, by saying that only a certain class of APIs is being considered, but then it defines API as a "set of names." Similarly, you say that any application touching more than one computer is a distributed system, and then you preemptively defend against exceptions by saying, "If this doesn't apply to you, maybe you don't have a distributed system."

More concretely, APIs do a lot of stuff. They send and receive text messages and emails; they transcode video; they turn on your coffee maker; they post to your Facebook wall. Often there is little or no shared representation, except perhaps the status of the operation, which can typically be communicated in a simple way.

Don't get me wrong, I think more APIs could work by synchronizing state. Basically, use something equivalent to a git repo under the hood. Gmail could work this way. Maybe mail servers could even work this way.

Posting to a Facebook wall doesn't work this way. The way to make posting to a Facebook wall use CRDT would be to replace API calls like addPost and deletePost (say) with a single API call "updateWall" which performs arbitrary operations on a user's wall. Thanks to CRDT, this operation never fails (though the client may still want to know when it has completed). In casual conversation at Meteor, we call it the "Lotus Notes" model when all operations go through the data layer, which synchronizes over the network. Asana's internal framework also uses this model, so a couple Meteor devs who worked at Asana have experience with it. The main drawback is that it is difficult to perform validation and security checks. If the Facebook API only has "updateWall," Facebook must determine whether the diff it receives constitutes a valid operation or series of operations for user A to perform on user B's wall (for example, you can add any number of posts to anyone's wall, but only delete posts off your own). This is much more complicated than having addPost and deletePost, each with the appropriate security checks, and knowing that no other operations are permitted.

To abolish The API completely like you say, you'd have to not just have updateWall but basically one, unnamed API call for all of Facebook, and then you could say there's no API.

cemerick · on May 14, 2014

A lot of different distributed storage and computation architectures are special cases of CRDTs, just with different sets of commutative operations and/or convergent types of state. (One of the aspects of CRDTs that I most appreciate, as it provides a framework within which one can compare different technologies in a thoroughgoing way.) Ones I like to cite as common examples that people have often touched before are datastores like Riak, CouchDB, and S3.

The document model treatment you describe is talked about some in the Shapiro et al. paper as a "continuous sequence", and is roughly what was used by Logoot and Treedoc. The latter is explored more thoroughly here: http://arxiv.org/abs/0907.0929.

I was only talking about network APIs in the original piece. The "set of names" bit was there to establish the lineage between "classic" programming language/library APIs and those that touch the network.

APIs themselves do exactly nothing. It is the computational service on the other side of an API that does something. This conflation is exactly the sort of thing that is allowed and encouraged by the construction of APIs as "just another function you call in your runtime".

I find the Facebook examples you offer very curious. APIs have no inherent model for authentication and authorization, and the same goes for CRDTs. So, why do you think that verifying authorization over a set of operations or set of modifications to some state is any different than verifying authorization on N operations attempted via N API endpoints? I'll certainly grant that the latter comes with a body of current programming practice and infrastructure, but that hardly an endorsement of its relative quality or suitability for the job-to-be-done.

My preferred characterization is that the Facebook API would be replaced with a data model. The original piece already hints at a number of advantages to such an architecture, and omits many others that I'll talk about at a later date.

dgreensp · on May 20, 2014

I'll certainly grant that the latter comes with a body of current programming practice and infrastructure, but that hardly an endorsement of its relative quality or suitability for the job-to-be-done.

You don't see "comes with a body of current programming practice and infrastructure" as an endorsement for "suitability for the job-to-be-done"? :) It doesn't bear directly on the main discussion we're having, but for people who are trying to get things done, I would say it's quite a strong endorsement of a particular practice to say that we understand how to apply it successfully and that there are tools, known patterns, and infrastructure around it.

Verifying authorization on a set of operations is hard or easy depending on what the operations are. High-level operations that correspond to the actions that users are presented with in the user interface tend to be easy to secure and validate. Low-level database operations tend to be hard to secure and validate as your data model becomes complicated, because you basically have to reverse-engineer the high-level operation that licensed the low-level operations (which could be many and spread across different tables).

I think the best way to expose a Facebook wall (say) via CRDT is to define a Wall datatype whose operations are the permitted high-level actions you can take on a wall. Then they are easy to validate, as you say. This is sort of how Google Docs is implemented -- the core of the application consumes high-level operations (insert column, sort rows, etc.) from different users and updates the document state, and then this core is replicated in different data centers. Most discussions I've seen seem to assume that CRDT and OT operations are simple, generic operations, but I think the real magic is in treating it as a paradigm like OO and defining datatypes within it.

I still have a hard time conceptualizing an API call as a timeless, commutative modification to a model. I love CRDT and OT, but I've just seen too many network APIs to put them in a box. Meteor actually has more shared data model between client and server than any other framework I know of. The general case of APIs, for us, is you tell the server to go do something, and it tells you when it's done it, and you also get streaming updates for the parts of the data model you care about (and a marker telling at what point the changes you caused landed).

Basically my main point is that you still need high-level verbs that carry intent, whether they are API calls or operations. Otherwise you have data changing and no accounting for it. It's like how banking is mainly about transactions, not about telling each other how much money is in account X or Y.

gritzko · on May 13, 2014

I'm the author (the leading one) of Yandex Live Letters, which is a CRDT-based EtherPad-like thing. Some flavours of CRDT are indeed related to OT. My favorite technique (pure op-based CRDT variant) is very much operation-centric, but instead of transformations (like in OT), it employs per-operation Lamport identifiers.

Based on our new project named Swarm [1] I may say that CRDT and "async RPC" fits rather nicely together.

OT indeed behaves poorly in a highly asynchronous environment. I suspect, that is the reason why Google Docs doesn't have decent offline mode yet. CRDT (any flavor) is async-friendly.

[1] http://slideshare.net/gritzko/swarm-34428560

jorangreef · on May 13, 2014

I think operational transformation is more of a predecessor to CRDTs than a cousin, and OT simply does not work offline, whereas CRDTs do.

ddp · on May 12, 2014

I believe it was Leslie Lamport who said, "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable."

cmeiklejohn · on May 12, 2014

The reference is here:

http://research.microsoft.com/en-us/um/people/lamport/pubs/d...

cemerick · on May 13, 2014

Dammit, thank you. I've even read that exact message before. :-/ Post updated!

josephschmoe · on May 12, 2014

Many API users are not knowledgeable to the intricacies of network programming. This could definitely use an executive summary at the top with the following info: "What will these new libraries that replace APIs offer to single point API users?"

cemerick · on May 13, 2014

(Author here.) As a start, hopefully removing the blinders that makes statements like "many API users are not knowledgeable to the intricacies of network programming" oh so true. That that statement can be made without popular incredulity only reinforces the point that modern network API technologies have largely been built to sustain the illusion that there is no network, and you're just making a method call somewhere. Insert this-plt-life GIF here. :-P

Building systems with things like CRDTs and tools and languages that support CALM will allow people using point-to-point APIs to continue to do the things they do now, but remove much of the incidental complexity from the equation. An example would be that, when you are relying upon N replication mechanisms to move CRDT state or operations from _here_ to _there_, you don't need complex timeout, retry, and backoff mechanisms to compensate for the realities of what's connecting the two parties. The message will arrive when it arrives...exactly the only guarantee that you can make in the general case of someone talking to external services.

anuraj · on May 13, 2014

This is a network programmers view. System, Network programmers concern themselves about systems and topologies. For an application programmer, both needs to be abstracted and only the business logic is important. Money is in the top of the pyramid now - hence the proliferation of APIs.

cemerick · on May 13, 2014

The point of the piece, in large part, was to emphasize that we're all network programmers. If you're whacking away with APIs and couldn't care less about the broader system its topology, please stay away from me. ;-)

APIs are proliferating because their coupling around client/server process and data representation makes for high switching costs and thus sweet vendor lock-in.

logn · on May 13, 2014

Rest is less platform dependent than SOAP/RPC. I see that as the main benefit. JSON is easier to work with than XML. The whole idea of service oriented architectures is that users don't need to care about the tech stack details of your service. Rest and JSON do a better job of realizing that vision than SOAP/XML. I don't think anyone's claimed that Rest is a design pattern to end all woes. Maybe we haven't given due thought to what new design patterns (or data structures, architectures, etc) are emerging these days, and in that light, the article presents a lot of interesting pointers.

virmundi · on May 13, 2014

Actually JSON makes it damn near impossible to uniformally implement REST. You need HATEOS. This means that there has to be a semantic of following links in resources. JSON lacks this ability. ATOM or RSS, both XML, have linking. Heck, XML at a language level supports document linking. HTTP + JSON != REST.

logn · on May 13, 2014

Interesting. I looked into HATEOAS more and found this link to a PayPal API:

https://developer.paypal.com/docs/integration/direct/paypal-...

I think I'll try to follow Rest better. Anyhow, I still prefer JSON. To me it's a more concise way to explain data, with nice syntax for arrays and maps. I'd rather jump through a few hoops to implement true Rest than try to craft verbose XML schemas for simple things like arrays and maps. ATOM and RSS I think are good for their main use cases but aren't as generic of syntax as JSON.

icebraining · on May 13, 2014

That's what JSON-LD and other formats are for. As a mediatype, application/json is certainly useless for REST, but there's nothing wrong with using it as an encoding for more semantically relevant formats.

jalfresi · on May 13, 2014

Whilst I agree with you, there is nothing stopping someone defining link structures in JSON documents, coining a new media type e.g. JSON+Link and boom, problem solved.

virmundi · on May 13, 2014

Well, I'm responding to you and to your siblings. You're right. You can create a new MIME type and the problem is solved. Fortunately, as the sibling comments pointed out, there is an extension. What I want to see happen is that the JSON + Links becomes a standard. Ideally a W3C. Otherwise we're into the old XKCD comic https://xkcd.com/927/

deathtrader666 · on May 13, 2014

If you want document linking, doesn't JSON-LD [1] work ?

1 - http://json-ld.org/

kylebrown · on May 12, 2014

Distributed API's are a big part of Ethereum. I think the Merkle tree of the bitcoin blockchain (and the Patricia tree of the Ethereum blockchain) might even qualify as a semilattice.

In fact, its by the physics of information theory that a cryptographic blockchain solves the consensus problem. Specifically, information theory emerges from the laws of thermodynamics: Maxwell's demon is essentially what secures one's private keys from brute-force cracking attempts.

I'd like to see a comparison of how the blockchain solves the CAP problem, alongside CRDT's. Are they not both solutions to the same problem?

marktangotango · on May 13, 2014

FYI information theory entropy and physics entropy really aren't the same thing:

http://physics.ucsd.edu/do-the-math/2013/05/elusive-entropy/

kylebrown · on May 13, 2014

Well that seemed excessively pedantic IMHO. It actually didn't touch much on information theory, and where it did, many of the comments disagree. I'll cite the Landauer limit[1] as what (yes, arguably) connects the entropy of information theory to the entropy of physics.[2]

Also, I only mentioned physics because the article did, quoting Lamport "Most people view concurrency as a programming problem or a language problem. I regard it as a physics problem."

Unfortunately the article didn't elaborate any more on the precise type of physics problem in question (Maybe Lamport does elsewhere), whether the physics of computational complexity or the physics of information theory, or something else. But even those two sub-fields have many connections and similarities (as does pretty much everything in physics and math. such connections are the bread-and-butter of theoreticians).

1. http://en.wikipedia.org/wiki/Von_Neumann-Landauer_limit

2. http://en.wikipedia.org/wiki/Entropy_in_thermodynamics_and_i...

kylebrown · on May 13, 2014

Here's an article which discusses Lamport's view: "The physics of distributed information systems"[1]. The first sentence: "This paper aims to present distributed systems as a new (interesting) area of applications of statistical physics, and to make the the case that statistical physics can be quite useful in understanding such systems."

It has several mentions of statistical physics, but (curiously) no mentions of entropy. It does however discuss the Byzantine Generals problem, which of course is the problem the bitcoin blockchain solves.

1. http://iopscience.iop.org/1742-6596/473/1/012017/pdf/1742-65...

kylebrown · on May 13, 2014

Okay, last one. Most concise counter-argument: http://en.wikipedia.org/wiki/Brute-force_attack#Theoretical_...

Derived from the Landauer limit.

sagargv · on May 13, 2014

Joel Spolsky had written along similar lines and argued that it's important to know what is happening beneath abstractions.

http://www.joelonsoftware.com/articles/LeakyAbstractions.htm...

rooted · on May 13, 2014

I think a distributed system is better defined as a system where timing becomes an issue to the coordination of components in the system.

richm44 · on May 12, 2014

I stopped at the point where he claimed that APIs were always synchronous, this wasn't even true in the 80s. For example XLib is a rather well used API and is asynchronous (there are many others).

koide · on May 12, 2014

two paragraphs later he addresses that point, calling that support limited in current api designs

richm44 · on May 12, 2014

Not really, he talks about HTTP which wasn't really designed for that purpose. There are plenty of protocols that were. Does this actually have anything to add that isn't covered in http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Comput... , if so then I'll read further.

pjscott · on May 12, 2014

It has some very interesting stuff about CRDTs, which is definitely worth a look.

dllthomas · on May 12, 2014

My recollection was that Xlib is an unfortunately synchronous library written for the the wonderfully asynchronous X protocol.

bitwize · on May 13, 2014

My understanding was that the Xlib protocol's asynchrony was hardly wonderful, and made syncing with vblank and pixel-perfect frames difficult, which motivates its abandonment in favor of the synchronous, local-host-only Wayland protocol.

dllthomas · on May 13, 2014

Certainly, the X protocol's asynchronous nature isn't without some downsides (though I think you could address the vblank sync without discarding it). However, I maintain that Xlib itself was a synchronous interface in front of an asynchronous protocol - which gives us the worst of both worlds and motivated Xcb.

mantrax5 · on May 12, 2014

So he favors exposing a standard set of distributed data models instead of having APIs.

What a horrible idea.

Exposing implementations is bad because implementations change.

Exposing implementations is bad because as you expose the intricacies of your data model to your client (which he claims is a benefit) you in turn obscure and hide the intricacies of your business ___domain, which will surely not allow you to patch a service's distributed data tree in an arbitrary fashion.

It's in essence like having SQL as your underlying data model, and replacing your API with an open read/write/delete access to your SQL server to the entire world, and hoping everyone will run the right queries and all will be all right.

It won't be all right.

APIs will become more asynchronous and eventually all APIs will be seen as protocols, that don't necessarily follow a simple request/response pattern.

But they'll remain in the form of abstract commands and queries modeled after the business ___domain of the application, and not the underlying data model of it.

derefr · on May 12, 2014

> It's in essence like having SQL as your underlying data model, and replacing your API with an open read/write/delete access to your SQL server to the entire world, and hoping everyone will run the right queries and all will be all right.

I find it kind of amusing that this was the original purpose of having an "SQL server": letting people (e.g. auditors) submit arbitrary queries, so you won't have to anticipate what exactly they'll want to do with your data. (Write-access was intended to be segregated to particular database users writing to particular tables, though--basically parallel to using WebDAV with HTTP Basic Auth.)

mantrax5 · on May 13, 2014

It was, yes, and to this day read-only SQL access to certain tables is not that bad of a practice to allow for report-generating apps within a company.

However the idea of exposing SQL databases publicly as an approach never took hold for many reasons we're today aware of. And the idea of public write access is ridiculous right from its premise.

The anti-API rant of this author shows us that those who don't know their history are doomed to repeat it.

cemerick · on May 13, 2014

Hi, author here.

APIs already necessitate the use of "standard sets of data models", except such "standardization" takes place over and over for each provider of a particular type of service. Further, APIs themselves have incompatible changes that flow from their underlying transport mechanisms (changing URLs, etc).

Right now, you're sharing data with "clients" that end up depending upon the particular details of that data and its (probably impoverished) representation. IMO, might as well own up to it and address that instead of thinking that you're building anything other than siloed services that demand a high degree of client-server coupling.

Changing data models is a fact of life. I'd much rather have a data medium that accounted for that from the start than a set of folklore about which services accept which data, and in what formats. "Patching" of extant data is not necessary (though certainly possible, depending on all sorts of factors); things like views are hardly new, and can be leveraged at every level of the system to match old (or new!) shapes of data with services that expect new (or old!) data.

You say that "APIs will become" something. Their defining feature is their manifestation in our programming languages and libraries, not their semantics with regard to the network. Network APIs have been kicking around for 30-40 years, web APIs for 20 years now. I don't think we should expect much new at this point. I'd rather look towards approaches that have something substantial to say about the fundamental problems in question.

Shorel · on May 13, 2014

I believe it is not having SQL as you data model, it is having a GIT repository as your data model.

Git has the lattice eventual consistence the article talks about.

About opening the data to the entire world, just thin about GitHub public repositories.

The missing issue is how to deal with merge conflicts.

Shorel · on May 13, 2014

Errata: thin should be think

dreamfactory2 · on May 13, 2014

Hmm the article seems based on some false assumptions. I'd argue that the whole point of REST as an architectural style is to be stateless and async. Of course you would use an ESB of some kind rather than point-to-point if you want to protect yourself from failure of a solution component - REST lends itself well to that or to building error-handling in the client. And isn't 'turning operations into data' what we are doing by switching from a verb-based model to a noun-based one?

cemerick · on May 13, 2014

Hi, author here. REST has its set of semantics, but (a) I don't think they're particularly useful for building computational services with, and (b) it's for all practical purposes predicated on HTTP, which carries a lot of baggage. Each _request_ is stateless (barring things like sessions^H^H^H^H^H hack workarounds), but clients and servers certainly are not; and, how one maintains that state and orchestrates further REST interactions based on an intermediate response is entirely on the implementer, _every single time_ a service or client is built/used.

I'd personally much prefer communication and computational primitives that can just as easily be used for a point-to-point interaction as they can be used to _build_ an ESB (enterprise service bus, I believe you mean?) if that's what I want.

I don't think nouns vs. verbs are a useful distinction. Turning operations into data is a first step, but all data is not equivalent. Some representations lend themselves to composition such that you can represent essentially arbitrary structures (sets, graphs, trees, multimaps, etc), but most (including the common ones of JSON and XML) do not. Likewise, some data representations allow for commutative operations so as to reconcile concurrent actors' activity, but most (again including JSON and XML) do not.

ryanobjc · on May 13, 2014

CRDTs are absolutely fascinating, but sometimes I really wonder. It seems like you throw words like 'semi-lattice' around ...

Also there is one particular element to the eventual consistency that bothers me, it's that all these eventually consistent algorithms aren't how high powered neural nets will work. Our brain is highly eventually consistent, but it computes without ever needing these algorithms.

bm1362 · on May 13, 2014

I think you're taking the neural net _model_ of the brain too strictly.