Riak 1.1 Released

rb2k_ · on Feb 22, 2012

I love to go back to Riak every now and then and throw one of the bigger datasets I have sitting around at a single node using Ruby. It is one of the most interesting databases out there and it's a breeze to set up (especially when compared to Cassandra).

It is kind of a shame though. Riak has awesome features, but some of them (e.g. secondary indexes) are hard to use and documentation is somewhat missing.

If you look at the official clients page (http://wiki.basho.com/Client-Libraries.html#Ruby), you can see ripple (https://github.com/seancribbs/ripple) but the page doesn't mention riak-ruby-client (https://github.com/basho/riak-ruby-client) which was split from ripple some time ago.

In theory, both of them support secondary indexes as far as I can tell, but you won't find that feature in either readme. There are some specs available that somehow describe parts of it though. I still don't see clearly how I could search secondary indexes from either library.

The same goes for things like data vs raw_data and serializers (small discussion: https://github.com/basho/riak-ruby-client/pull/19). While there is a very informative screencast in Seans blog, there is no mention in the readme file.

jbellis · on Feb 22, 2012

Curious what you found so hard about setting up Cassandra. Counterpoint: http://www.screenr.com/5G6

tolitius · on Feb 22, 2012

I agree. Setting up Cassandra is simple. It is later when you get to things like:

    mutator.setColumnOrSuperColumn( ... )
    mutationsMap = new HashMap<ByteBuffer, Map<String, List<Mutation>>>()..

    compound column..
    super compound family of super columns...

where it feels more of a senior project, rather than something simple as:

    curl POST '{"red":2, "blue":4}' http://127.0.0.1:8091/riak/laundry/shirts

Although CQL fixes some of the problems with Cassandra complexity [Thrift does not even sound good in all 3 languages I know], I still believe it will be far more "desirable" (as the real http://en.wikipedia.org/wiki/Cassandra was) if more forces are applied to simplicity and polish.

rb2k_ · on Feb 22, 2012

It annoys the hell out of me that I can't launch the app by default because it points at directories that aren't user-writeable. Riak just has the etc and var directories inside the tarball. Cassandra just assumes that it is able to write to /var/...

Also: Setting up the Schema changes quite a bit since 0.7 and that was always kinda annoying. Nothing that is really that big of a problem, but it feels like I have to change the code every minor version since 0.7. (note that they didn't set up a schema in the screencast and the cluster that got started is pretty useless, isn't it?)

nirvana · on Feb 22, 2012

Wait, can't you fix the wiki? You're right that its an oversight, but one you can fix yourself.

rb2k_ · on Feb 22, 2012

Sure, if I actually knew the correct solution...

Seeing as Basho is backing Riak, I would assume that they would invest a little bit of time documenting what they actually did :)

davidcollantes · on Feb 21, 2012

For those, like me, who do not know what Riak is:

"Riak is an open source, highly scalable, fault-tolerant distributed database."

Source: http://basho.com/products/riak-overview/

ismarc · on Feb 21, 2012

It also makes different trade-offs in usability of the software. It is (in my opinion) the most sysadmin friendly datastore, without hiding complexity, the tools are just pleasant to use. However, it has a larger curve for developers. We did a shootout to find a solution that would work for our needs and it came down to Mongo vs. Riak. We went with Mongo and are feeling the pain of getting everything automated and set up, but consider it a one-time cost rather than an "every time we get a new developer" cost.

wfarr · on Feb 21, 2012

Having worked with some quite large Mongo setups, I think you'll find the administration of Mongo ends up being significantly more than a single upfront cost.

Dealing with sharding + replica sets is an administrative nightmare with Mongo.

mrkurt · on Feb 21, 2012

We (MongoHQ) do lots of sharding + replica set admin. Sharding is finicky, but we've found replica set admin to be really straightforward.

Getting client apps configured properly is the hardest part of replica set use. It's only a one time problem though.

PanMan · on Feb 22, 2012

Do you have some more insight into this? We currently run a hybrid Riak-MySQL setup, and are actively looking at Mongo to replace parts of it. In a few tests mongo 's speed seems to be all over the place: From REALLY fast to quite slow. And that is without the admin involved in a larger scale setup.

Ixiaus · on Feb 21, 2012

It's a distributed multi-node key-value store. They built it using Erlang (I almost consider that, alone, a "feature").

firefoxman1 · on Feb 21, 2012

Yeah, it's important to note that it's a key-value store; that's what attracted me initially.

gry · on Feb 21, 2012

Based off Amazon's Dynamo work (think S3).

CPlatypus · on Feb 21, 2012

There's little relationship between Dynamo and S3. Riak is based on the former but not the latter.

argvzero · on Feb 22, 2012

Correct. I've heard conflicting accounts on how much of the original Dynamo was/is used by S3.

(Also, if you like S3, keep your eyes on Basho over the next few weeks).

nirvana · on Feb 22, 2012

If it claims to be distributed and its not written in erlang, don't use it.

Well, that's the filter I use when looking at anything. If its not written in erlang I spend a lot of time trying to figure out exactly how it isn't really actually a distributed system. Usually I find out that it isn't.

Ixiaus · on Feb 22, 2012

While I generally agree with you, there are some applications out there that get distributed multi-node environments "right" without using Erlang. It's just much harder to do because those problems are (for the most part) solved BY Erlang for the programmer.

If something says it is parallel this, or concurrent that, that is when I filter it - thus far Erlang's concurrency model has been unmatched (in my limited experience) by any other language I've used for efficacy and simplicity.

ericflo · on Feb 22, 2012

That's an extremely crude filter. A lot of the best distributed system stuff is built on the JVM. Too many to name, but I'll name one anyway: Zookeeper

rb2k_ · on Feb 21, 2012

Oh yay, I wonder how enabling Snappy on LevelDB changes the performance characteristics.

I really liked the introduction of Snappy to CouchDB, especially for EC2 machines with their usually slow IO

ahi · on Feb 22, 2012

I want to love Riak, but the documentation is a mess. At least the 'Fast Track' is out of date and inconsistent: http://wiki.basho.com/Building-a-Development-Environment.htm...

nirvana · on Feb 22, 2012

Glad to see Basho putting out new releases at such a clip. I'm envious, frankly.

It is really impressive how far Riak has come in the last year.

I don't think anything compares to it, and I think it should have an order of magnitude more interest and users. (I think people just get scared off by "erlang", which is silly.)

no-espam · on Feb 22, 2012

how much does riak pay you?

jsavimbi · on Feb 21, 2012

I'm very interested in seeing the new Admin console; is there a URL for that. Finding the docs a little light.

cmeiklejohn · on Feb 21, 2012

You can see a demo from Mark here:

http://basho.com/blog/technical/2012/01/30/Riak-in-Productio...

jsavimbi · on Feb 21, 2012

Looks great. And reassuring. I just followed the instructions and restarted. https://github.com/basho/riak_control