I love to go back to Riak every now and then and throw one of the bigger datasets I have sitting around at a single node using Ruby. It is one of the most interesting databases out there and it's a breeze to set up (especially when compared to Cassandra).
It is kind of a shame though. Riak has awesome features, but some of them (e.g. secondary indexes) are hard to use and documentation is somewhat missing.
In theory, both of them support secondary indexes as far as I can tell, but you won't find that feature in either readme. There are some specs available that somehow describe parts of it though. I still don't see clearly how I could search secondary indexes from either library.
The same goes for things like data vs raw_data and serializers (small discussion: https://github.com/basho/riak-ruby-client/pull/19). While there is a very informative screencast in Seans blog, there is no mention in the readme file.
I agree. Setting up Cassandra is simple. It is later when you get to things like:
mutator.setColumnOrSuperColumn( ... )
mutationsMap = new HashMap<ByteBuffer, Map<String, List<Mutation>>>()..
compound column..
super compound family of super columns...
where it feels more of a senior project, rather than something simple as:
curl POST '{"red":2, "blue":4}' http://127.0.0.1:8091/riak/laundry/shirts
Although CQL fixes some of the problems with Cassandra complexity [Thrift does not even sound good in all 3 languages I know], I still believe it will be far more "desirable" (as the real http://en.wikipedia.org/wiki/Cassandra was) if more forces are applied to simplicity and polish.
It annoys the hell out of me that I can't launch the app by default because it points at directories that aren't user-writeable. Riak just has the etc and var directories inside the tarball. Cassandra just assumes that it is able to write to /var/...
Also: Setting up the Schema changes quite a bit since 0.7 and that was always kinda annoying. Nothing that is really that big of a problem, but it feels like I have to change the code every minor version since 0.7. (note that they didn't set up a schema in the screencast and the cluster that got started is pretty useless, isn't it?)
It also makes different trade-offs in usability of the software. It is (in my opinion) the most sysadmin friendly datastore, without hiding complexity, the tools are just pleasant to use. However, it has a larger curve for developers. We did a shootout to find a solution that would work for our needs and it came down to Mongo vs. Riak. We went with Mongo and are feeling the pain of getting everything automated and set up, but consider it a one-time cost rather than an "every time we get a new developer" cost.
Having worked with some quite large Mongo setups, I think you'll find the administration of Mongo ends up being significantly more than a single upfront cost.
Dealing with sharding + replica sets is an administrative nightmare with Mongo.
Do you have some more insight into this?
We currently run a hybrid Riak-MySQL setup, and are actively looking at Mongo to replace parts of it. In a few tests mongo 's speed seems to be all over the place: From REALLY fast to quite slow. And that is without the admin involved in a larger scale setup.
If it claims to be distributed and its not written in erlang, don't use it.
Well, that's the filter I use when looking at anything. If its not written in erlang I spend a lot of time trying to figure out exactly how it isn't really actually a distributed system. Usually I find out that it isn't.
While I generally agree with you, there are some applications out there that get distributed multi-node environments "right" without using Erlang. It's just much harder to do because those problems are (for the most part) solved BY Erlang for the programmer.
If something says it is parallel this, or concurrent that, that is when I filter it - thus far Erlang's concurrency model has been unmatched (in my limited experience) by any other language I've used for efficacy and simplicity.
That's an extremely crude filter. A lot of the best distributed system stuff is built on the JVM. Too many to name, but I'll name one anyway: Zookeeper
Glad to see Basho putting out new releases at such a clip. I'm envious, frankly.
It is really impressive how far Riak has come in the last year.
I don't think anything compares to it, and I think it should have an order of magnitude more interest and users. (I think people just get scared off by "erlang", which is silly.)
It is kind of a shame though. Riak has awesome features, but some of them (e.g. secondary indexes) are hard to use and documentation is somewhat missing.
If you look at the official clients page (http://wiki.basho.com/Client-Libraries.html#Ruby), you can see ripple (https://github.com/seancribbs/ripple) but the page doesn't mention riak-ruby-client (https://github.com/basho/riak-ruby-client) which was split from ripple some time ago.
In theory, both of them support secondary indexes as far as I can tell, but you won't find that feature in either readme. There are some specs available that somehow describe parts of it though. I still don't see clearly how I could search secondary indexes from either library.
The same goes for things like data vs raw_data and serializers (small discussion: https://github.com/basho/riak-ruby-client/pull/19). While there is a very informative screencast in Seans blog, there is no mention in the readme file.