How to multiply numbers in a column in MySQL

joshu · on Aug 27, 2009

switching to log-space is insanely useful for compounding returns.

inside mysql, i used it all the time.

for example, perhaps i want to get a sense of the distribution of users; i have a table "users" with columns "user" and "count"

i could get a log-scaled histogram like so:

select pow(int(log(count)/log(10)),10) as bin, count(*) as ct from users

this would get a table like such (forgive my ascii art)

    bin | ct
    ----+---
    1   | 12
    10  | 27
    100 | 35

another thing you can do is us the order of magnitude with repeat(char, num) to make a sort of visual bar chart.

that said, i'm sorta glad i don't have to use mysql professionally anymore.

physcab · on Aug 27, 2009

I was surprised that MySQL lacked a Product() aggregate function with its GROUP BY.

After much searching I came to this little nugget:

"exp(sum(log(coalesce(the field you want to multiply,1)))

The coalesce() function is there to guard against trying to calculate the logarithm of a null value and may be optional depending on your circumstances."

MySQL is full of hacks.

chrisbolt · on Aug 27, 2009

You could always write a patch and send it to them. A product aggregate function probably wouldn't be hard to write.

crux_ · on Aug 27, 2009

I'm torn as to whether to call that a 'hack' or not, since it's just a bit of math. [ exp(log(x)) == x; log(ab) == log(a) + log(b) ]

kogir · on Aug 27, 2009

It loses precision in scenarios where the desired aggregate would not. It's a hack.

crux_ · on Aug 27, 2009

I'd be more inclined to call it a hack because it probably either crashes or gives the wrong answer when it encounters a zero. ;)

jwr · on Aug 27, 2009

This solution is not equivalent numerically to a multiplication and I hope nobody uses this where precision is important.

Remember that floating point isn't the same as the math you were taught in high school.

huntse · on Aug 27, 2009

Relational databases really are for doing relational operations. If you're using them for doing bits of arithmetic then you're going to come up against all sorts of limitations. Trivially, if you use this computed value in a where clause the query optimiser will have no choice but to do a full table scan. It can't index by all possible computed values.

Horses for courses. Do the query in whatever programming language, then just multiply the numbers together. I don't try to do left outer joins on my HP12-C calculator either.

nimbix · on Aug 27, 2009

I agree that using a precomputed value in a where clause is a good idea. But it's very likely that computing that value inside a DB engine and fetching a single value is going to be faster than fetching 50 values and calculating the product in your app.

Also, consider the following case: UPDATE huge_table SET someval = (SELECT PRODUCT(x) ...). Splitting this into 50 million queries wouldn't be a very good solution.

kogir · on Aug 27, 2009

Limitations like this are among the most trivial of the reasons I pay for MS SQL. Even though no such built in aggregate is available, I'm free to write my own in C/C++/.Net.

I'm pretty sure PostgreSQL lets you extend it however you want as well.

joshu · on Aug 27, 2009

You can write plugins in mysql as well.

http://dev.mysql.com/doc/refman/5.1/en/plugin-creating.html

Please do not make things up and claim your imagination as truth.

nimbix · on Aug 27, 2009

The Postgres way is much simpler:

    CREATE OR REPLACE FUNCTION product_sfunc(state NUMERIC, value NUMERIC) 
    RETURNS NUMERIC AS $$ 
    BEGIN
        IF value IS NOT NULL THEN
            IF state IS NULL THEN 
                RETURN value;
            ELSE
                RETURN state * value;
            END IF;
        END IF;
        RETURN state;
    END; $$ LANGUAGE plpgsql;
    
    CREATE AGGREGATE PRODUCT (NUMERIC) (
        SFUNC = product_sfunc,
        STYPE = NUMERIC
    );

This took only a couple of minutes to write and didn't require any extra tools. Now I can use the PRODUCT() aggregate in queries, like this:

    SELECT SUM(n), PRODUCT(n) from generate_series(1, 5) as n;
     sum | product
    -----+---------
      15 |     120

In Postgres 8.4 it also works as a windowing function which means I can use it to get a running product (is that the correct expression?):

    SELECT n, PRODUCT(n) OVER (ORDER BY n) from generate_series(1, 5) as n;
     n | product
    ---+---------
     1 |       1
     2 |       2
     3 |       6
     4 |      24
     5 |     120

joshu · on Aug 27, 2009

You presented some made up limitation as fact; you said it CANNOT be extended. Now you're trying to support your argument further by saying it's EASIER elsewhere. You should have just said that. I hate it when the opinionated use made up facts to justify their opinions. Please don't do that.

All these database engines are pretty weak. I prefer to use SAS for this kind of thing (including seeing stuff from previous/future rows.) It may not be client/server, but it is astoundingly fast for large datasets nonetheless.

kogir · on Sept 12, 2009

The person to whom you replied did no such thing. For someone so caught up in correctness, it's surprising you can't be bothered to read user names :)

kogir · on Sept 12, 2009

My bad. I suppose you could also credibly argue that since it's open source nothing is impossible, and that MySQL is infinitely more extensible than MSSQL.

Also, it seems C/C++ support is deprecated in MSSQL, so it's not a proper solution either. CLR integration is what they're pushing now.

annoyed · on Aug 30, 2009

i think this originated from joe celko's sql for smarties:

http://books.google.com/books?id=P1o4lxVnExsC&pg=PA340&#...

mahmud · on Aug 27, 2009

Is there any good reason why you want to do computation like this with a database query, instead of, say, middle-ware?

For the product

  (reduce #'* (select 'value :from 'product :flatp t))

If you want the sum change #'* to #'+.

garethm · on Aug 27, 2009

If you do it in the database, you don't have to shunt all the values across the wire from the database server. If the middle tier is on a different machine, this can make a significant difference to the time taken to do the query.

mahmud · on Aug 27, 2009

good point, but as engineering trade offs go; do you want to tie your application's performance to one RDBMS' non-standard hacks; or do you want to throw cheap hardware, ram, fat pipes and caches at the problem? I can scale my lousy middleware-as-calculator solution with something as cheap as running data consumers and producers on the same machine, or as "expensive" as a gigabit switch and a fat cable. What would you gain from a beefy RDBMS that you couldn't get done with sharded db wisely spread across machines?