I abandoned Protocol Buffers because the Python implementation was too slow. The problem is that Google hasn't written a C-extension yet because it wouldn't be compatible with AppEngine. It's a known problem that has gone undocumented.
Also, I changed the test script from using time.time() to time.clock(), which according to the python docs should be used for performance testing on unixes.
For handling protocol buffers in Python, it is much faster to generate the C++ protocol buffer wrappers, and then swig them. It is bothersome to regenerate this every time you change the proto definition though.
I think the schema is causing this. Lists of strings of nonfixed size aren't going to yield good results, as any serialization framework now has to perform work to find the delimeters of each string. In this case you could store IP addrs and all the other DNS fields as ints and you should see a massive speedup. This would probably be closer to the actual workload google or fb sees - why would they be serializing huge records of data that's already been encoded into a human-readable / string format?
Protocol buffers won hands down as far as space used...
The speed issues are still there, but I'm sure that over time things will improve. If the C extension for simplejson can speed up serialization by an order of magnitude, I have no doubt that similar improvements can be made to protocol buffers and thrift.
I thought this is exactly how protocol buffers worked with non-fixed length fields. Doesn't it start the record with the length of the string? I'm not sure how thrift works, but probably the same way.
(Not speaking from experience, just from what I remember of the format when I read the specs).
Yeah, but the whole point is you don't have a fixed length record. And in the example given you shouldn't be using strings at all - integers will suffice - that's the real problem with this benchmark.
You might as well be testing how quickly thrift / pb / json could serialize / deserialize pickled blobs. Your not giving thrift or pb the data it's designed to perform well with, so the fact that it fails isn't surprising.
I'm sure a tesla would suck compared to an ancient pickup at helping me pick up a couch from craigslist, but I wouldn't say that the pickup is a better car because of that.
http://groups.google.com/group/protobuf/browse_thread/thread...