Hacker News new | past | comments | ask | show | jobs | submit login
Thrift vs. Protocol Buffers (floatingsun.net)
108 points by peterb on May 5, 2011 | hide | past | favorite | 24 comments



For me the most important point in this article is that Thrift includes an RPC implementation while Protocol Buffers does not.

This was very helpful while writing an iPhone application that records audio and sends it to a server for voice recognition processing. Thrift allowed me to setup the iPhone client and the Windows/C# server in only a few lines of code. Protocol Buffers required that I establish a socket connection, send the audio data across, and then reassemble the data on the server side. Not the worlds most difficult problem, but being new to Objective-C at the time it was a bit tricky. I wish I had known about Thrift when I was building my initial implementation based on Protocol Buffers.


Unfortunately, Thrift RPC is a pain in the butt for Python, especially if you want to code with Tornado (http://www.tornadoweb.org/). The Thrift-generated code necessitates either using Twisted (http://twistedmatrix.com/trac/) or writing in a blocking style. For Tornado users, the existing generated code is unusable, and the compiler doesn't facilitate generating client-side function stubs that support callbacks. Therefore, it is impossible to write a properly-working RPC mechanism with the tools that the Thrift compiler provides. The only way to get around this is to modify the compiler itself.

On the other hand, generated code from Protobuf allows you to slide in an asynchronous callback function (although yes, you have to write like 50 lines of code to build an RPC implementation).

I can't speak for other languages, but given Thrift's inflexibility with Python, I suspect the situation is at least as bad for less common languages.


I'm (honestly) curious why a custom serialization format and RPC was a better fit than HTTP for this problem.

What was the payload like that made this a better fit?


Thrift's RPC is (optionally) built on top of HTTP, but provides support for custom typesafe (kind-of) data structures and the conveniences of RPC. If you're sending data that's more complex than a single homogenous type, then delivering a structured message over RPC is a necessary abstraction.


I don't think network overhead is the problem here, it is about ease of development. Thrift generates working server code and you just have to implement RPC functions. On PB, you need a stack to handle connections, parse messages, dispatch, etc.


Note that Thrift supports HTTP as a transport.


When I first read about protocol buffers, I was surprised at the similarity to ASN.1/BER: http://en.wikipedia.org/wiki/Basic_Encoding_Rules

Basically, they're both nested type/length/value data formats with primitives for numerics, strings, etc with an human readable description language and toolsets to auto-generate language types + (de)serialisers etc.

Given that the ASN.1 toolset exists (even if a little dusty, SNMP and X.509 keep it alive) I don't see why google bothered to re-implement.

The FAQ: http://code.google.com/apis/protocolbuffers/docs/faq.html mentions ASN.1 but it's main argument (being tied to a particular form of RPC) doesn't apply to ASN.1.


Indeed, it has all been done before with ASN.1. ASN.1 was invented for the exact same reason: data-efficient, fast communication. Currently it is mainly in use by telecom.

I've also wondered why so many re-inventions of the wheel what is basically ASN.1 and did some research:

The main reason which I found was that, according to developers, for reimplementation ASN.1 was too complex to get right (it has a big legacy) and that the current toolsets had or not the right license, not the right languages, etc.

Also, they didn't like the ASN description syntax.


You can't be serious, the absurd complexity of ASN.1 is legendary.


Although not mentioned in the article Go now has support for Protocol Buffers:

http://code.google.com/p/goprotobuf/


Go also has gobs which according to Rob Pike improve on protocol buffers in several ways: http://blog.golang.org/2011/03/gobs-of-data.html

And while gobs are (by design) Go-centric, there are already implementations in C for example: http://code.google.com/p/libgob/

And Go also has the rpc package, which uses gobs (but can also use json or other encodings): http://golang.org/pkg/rpc/


The serialization/deserialization times are dramatically different for Python. Thrift has an accelerated binary serializer written in C with Python bindings, while Protobuf's is pure Python. While there exist third party C++ wrappers for Protobuf in Python as well, they are buggy (segfaults).


  export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp
  python setup.py build_ext
It's "EXPERIMENTAL", but it seems to work well.


If anyone is interested in solving this problem, please check out CyPB https://github.com/connexio/cypb

It is not ready for production use yet. Please help us by testing and submitting bug reports & patches!


This is ultimately the reason we chose Thrift over PB when developing our app (zite). Thrift was over 40 times faster in serialization for our python code.


I personally really like mspack and msgpack-rpc. There are a tone of well supported implementations for various languages and there are some speed and other advantages over thrift and Protocol Buffers. The core implementation is written in C++.

http://msgpack.org


The Python RPC libraries seem to rely on Twisted. Does it support generic code generation to fill in your own RPC implementation?


I am very cautious of Thrift's custom network stack.

We have a java backend service that gets thousands of requests per second per node and where latency is of upmost importance. We tested thrift for communication between the backend service and the front end web code, however we saw an increase in failed requests and latency compared to a server written using netty.

For us, using netty and Protocol Buffers works much better, but maybe we were using Thrift wrongly.


Some of the protobuf implementations are a bit more official than others. We are using the Go protobuf plugin at Tinkercad and it's maintained by Google for internal use. Given the importance of protobufs in communicating across Google is pretty safe to assume that the implementation is solid (disclaimer: I used to work at Google and know the folks maintaining the Go plugin).

That said, we are starting to miss a Javascript protobuf implementation. There is a lot of binary data to serialize across the client/server boundary and not all of it requires a custom format. It would be nice to just drop in server side protobufs and have them work seamlessly on the client.

I do understand the criticism about the missing RPC library but I've always found that you need to write your own anyway.


I like how this thing looks http://msgpack.org/ more than each of the aforementioned. Perhaps I'm missing something, but Thrift and protobufs both seem very lacking in comparison


Protocol Buffers are versatile, allowing nesting and includes but performance we got on Java server and PHP/iOS client was pretty poor and PHP libraries do not support whole specification.

So we switched to Thrift and whole FB stack with HipHop and Scribe and we're thrilled. Documentation is a problem just at the beginning when setting up the stack. Everything else later is self explanatory.


> But thrift and protobuf are by far the most popular

[citation needed] (seriously, I'm interested)

XML seems far more popular (in the sense of market-share/adoption, not in the sense of being liked).


That's a good point, but I think by "most popular" the author was referring to popularity in the hacker/startup community. One could make a similar argument about operating systems (http://en.wikipedia.org/wiki/File:Operating_system_usage_sha...) or web browsers (http://en.wikipedia.org/wiki/Usage_share_of_web_browsers), but I don't think anyone on hn would call Windows XP or Internet Explorer "popular"


That makes sense, since most startups are technology users, rather than technology seller. e.g. for a startup selling tools/middleware, technology market-share is customer market-share.

Thinking further, startups might be early-adopters of new technology, that will eventually become mainstream. But it doesn't seem to be a reliable predictor, since many (most) new technologies don't reach critical mass before being replaced by the next new thing. eg. ASN.1 binary serialization format. http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: