This accords with our experience with large Oracle databases as well. We've had ...

btilly · on May 16, 2013

I wish that databases did "speculative" queries.

In the simplest form, you could say, "I have 2 reasonable plans, let's try A, and if it takes above time X, then start B in parallel and go with whatever finishes first."

You could ramp up the idea to handle changing query plans based on updated statistics by sending some fraction of queries to one plan, and some to another. Then keep stats on how that worked out for you.

Basically never simply flip the switch in production to try a new, unproven query.

Incidentally 12g advertises that they will actually collect statistics during query execution and based on those will validate the query plan. If that works, then this problem should get better. But of course that comes with overhead, and is likely to be a complex piece of code, so you tell me whether you trust them to get it right right away.

hendzen · on May 16, 2013

I'm sure you'll appreciate the irony of this, but what you described is pretty much the exact mechanism of the MongoDB query optimizer [0].

[0] - http://docs.mongodb.org/manual/core/read-operations/#query-o...

jeffdavis · on May 17, 2013

A cost-based optimizer also needs a way to handle the combinatorial explosion of possible plans. For MongoDB, maybe they can be exhaustive or use a simple strategy. But for a SQL DBMS, the search space is way too huge, and you need a way to navigate it intelligently.

This "try the plan out" idea has come up many times, and it's a reasonable idea, but it's nothing new and not a complete solution. You need a way to evaluate whether a plan has a reasonable chance of being the best one before the whole plan is even constructed, much less executed.

Combinatorial explosion is also one of the reasons that humans generally aren't great at coming up with non-trivial plans, either.

A good optimizer is a combination of many good ideas; so I don't mean to be discouraging, just that one idea is not the full story.

If you are worried about the risk of a changing plan, you are between a rock and a hard place. If you lock the plan down (or hard-code it in the application), and the data characteristics change, that could be disastrous. If you don't lock it down, that's a leap of faith that the optimizer will adapt gracefully.

hendzen · on May 17, 2013

I understand the issues you've brought up in your post as well as the traditional methods of pruning the search space of query plans through using left-deep join trees and histograms for calculating selectivity.

My top-level point was that there is no way humans will come up with the optimal query plan by hand, and like you said even if they do - the data will change.

As an aside, the reason Mongo can use such a simple but effective strategy is because it doesn't support joins, which makes calculating the optimal plan much, much easier. RethinkDB does, however, and as such, a CBO is that much more important.

btilly · on May 17, 2013

I am not disputing the value of a cost based optimizer.

I'm just pointing out that if you have an existing plan that seems to be currently working the default should be to not just switch it up without testing that your switch didn't hurt anything. Instead you need to use care.

btilly · on May 16, 2013

The ironies are indeed rich. Thanks for that!

AlisdairO · on May 16, 2013

You can accomplish something pretty similar to this with oracle plan management: http://docs.oracle.com/cd/B28359_01/server.111/b28274/optpla... . You can set this up to pin plans, but let you know when it thinks it has a better plan to try out.

btilly · on May 16, 2013

Yup. This kind of complexity they have been forced into because the simple general solution that they think should work and which does the vast majority of the time also routinely causes pants on fire emergencies.

pkolaczk · on May 17, 2013

DB/2 does that for ages.