I was surprised that MySQL lacked a Product() aggregate function with its GROUP BY.
After much searching I came to this little nugget:
"exp(sum(log(coalesce(the field you want to multiply,1)))
The coalesce() function is there to guard against trying to calculate the
logarithm of a null value and may be optional depending on your
circumstances."
Relational databases really are for doing relational operations. If you're using them for doing bits of arithmetic then you're going to come up against all sorts of limitations. Trivially, if you use this computed value in a where clause the query optimiser will have no choice but to do a full table scan. It can't index by all possible computed values.
Horses for courses. Do the query in whatever programming language, then just multiply the numbers together. I don't try to do left outer joins on my HP12-C calculator either.
I agree that using a precomputed value in a where clause is a good idea. But it's very likely that computing that value inside a DB engine and fetching a single value is going to be faster than fetching 50 values and calculating the product in your app.
Also, consider the following case: UPDATE huge_table SET someval = (SELECT PRODUCT(x) ...). Splitting this into 50 million queries wouldn't be a very good solution.
Limitations like this are among the most trivial of the reasons I pay for MS SQL. Even though no such built in aggregate is available, I'm free to write my own in C/C++/.Net.
I'm pretty sure PostgreSQL lets you extend it however you want as well.
CREATE OR REPLACE FUNCTION product_sfunc(state NUMERIC, value NUMERIC)
RETURNS NUMERIC AS $$
BEGIN
IF value IS NOT NULL THEN
IF state IS NULL THEN
RETURN value;
ELSE
RETURN state * value;
END IF;
END IF;
RETURN state;
END; $$ LANGUAGE plpgsql;
CREATE AGGREGATE PRODUCT (NUMERIC) (
SFUNC = product_sfunc,
STYPE = NUMERIC
);
This took only a couple of minutes to write and didn't require any extra tools. Now I can use the PRODUCT() aggregate in queries, like this:
SELECT SUM(n), PRODUCT(n) from generate_series(1, 5) as n;
sum | product
-----+---------
15 | 120
In Postgres 8.4 it also works as a windowing function which means I can use it to get a running product (is that the correct expression?):
SELECT n, PRODUCT(n) OVER (ORDER BY n) from generate_series(1, 5) as n;
n | product
---+---------
1 | 1
2 | 2
3 | 6
4 | 24
5 | 120
You presented some made up limitation as fact; you said it CANNOT be extended. Now you're trying to support your argument further by saying it's EASIER elsewhere. You should have just said that. I hate it when the opinionated use made up facts to justify their opinions. Please don't do that.
All these database engines are pretty weak. I prefer to use SAS for this kind of thing (including seeing stuff from previous/future rows.) It may not be client/server, but it is astoundingly fast for large datasets nonetheless.
My bad. I suppose you could also credibly argue that since it's open source nothing is impossible, and that MySQL is infinitely more extensible than MSSQL.
Also, it seems C/C++ support is deprecated in MSSQL, so it's not a proper solution either. CLR integration is what they're pushing now.
If you do it in the database, you don't have to shunt all the values across the wire from the database server. If the middle tier is on a different machine, this can make a significant difference to the time taken to do the query.
good point, but as engineering trade offs go; do you want to tie your application's performance to one RDBMS' non-standard hacks; or do you want to throw cheap hardware, ram, fat pipes and caches at the problem? I can scale my lousy middleware-as-calculator solution with something as cheap as running data consumers and producers on the same machine, or as "expensive" as a gigabit switch and a fat cable. What would you gain from a beefy RDBMS that you couldn't get done with sharded db wisely spread across machines?
inside mysql, i used it all the time.
for example, perhaps i want to get a sense of the distribution of users; i have a table "users" with columns "user" and "count"
i could get a log-scaled histogram like so:
select pow(int(log(count)/log(10)),10) as bin, count(*) as ct from users
this would get a table like such (forgive my ascii art)
another thing you can do is us the order of magnitude with repeat(char, num) to make a sort of visual bar chart.that said, i'm sorta glad i don't have to use mysql professionally anymore.