The absolutely worst thing about MySQL is its lack of strictness. Maybe this has changed, but I was furious when I, by accident, inserted latin1 data into an UTF-8 table.
Instead of complaining about invalid input data, MySQL accepted the data and truncated it at the first invalid byte. Without error or warning.
Even if this was due to a misconfiguration on my part, behavior like this is completely unacceptable.
and don't get me started on other comparable issues like truncating overlong texts added to varchar fields, or accepting invalid dates and silently converting them to 0000-00-00 shudder
That is a very, very superficial comparison. Half of the contras for MySql (the ones referring to MyIsam) don't really apply, since you simply use InnoDB if you need those features.
So it's no real help in deciding which DB to pick...
True to a point, but MySQL lets you pick and match on a per-table basis.
I've used MySQL extensively, and use Postgres for most things now, and I prefer MySQL for many things. Not least replication, which is so much simpler with MySQL (I'd love to be proved wrong on this - please someone point me to a replication solution for Postgres that is as simple as for MySQL).
But the differences aren't great enough for most typical use that I'd worry too much about which one to use. Currently I use Postgres because that is what my team is familiar with.
It's a pretty crappy comparison. If you need transactions and ACID properties then you pick InnoDB and not MyISAM. Now what are the cons of PostgreSQL?
1) It's hard to install and manage, which could explain why Postgre's adoption is fairly low.
2) MySQL is used and proven to scale on very large installments (Yahoo, Facebook, Google - just to name a few). Postgre's usage is much lower (Skype and Reddit from those I know of)
3) Due to the adoption of MySQL it's very easy to get help and to buy help. For example: http://mysqlperformanceblog.com offers great advice on everything MySQL. Due to Postgre's smaller user base it's harder to get or buy help.
4) The development of "MySQL and MySQL tools" is very rapid. Check out Drizzle, Xtradb, xtrabackup, innodb plugin, rethinkdb.com (this is just a few, there are tons of other engines and tools under development). It's unclear to me how rapid Postgre's development is (for example, they still lack a decent replication), but I doubt Postgre's development is as rapid as MySQL's.
5) The advances features of Postgres are not that needed in most web applications. If they were that needed, more people would have used Postgres. I.e. Postgres has a lot of non-essential features, while it lacks some essential features (such as decent replication).
The bottom line is that MySQL is proven, easy to use and install, is under heavy development, is backed up by billion dollar companies and has a very large user base. MySQL is also built mostly for web-applications and offers _great_ performance, great tools and great replication.
1) hard to install? What about `apt-get install postgresql` or the usual ./configure-stuff is hard? Also, EnterpriseDB provides pre-built one-click installers for Windows and Mac OS X. Upgrading between major releases could be painful as it requires a dump/restore, but lately, we got pg_migrator which takes care of that (and pg 8.4's parallel restore helps in cases where pg_migrator fails - not that I know of any)
2) the .org ___domain is hosted with PostgreSQL. Scalability-wise, that looks good enough for me.
3) Kind of agreed, but then again, the documentation of PostgreSQL really rocks. In my 8 years of (near-fanatic) use of PostgreSQL, I never came across a problem the docs could not solve for me.
4) One major release per year might actually be quicker than what MySQL does. I don't know about related tools, because I don't generally have need for any. There's PGAdmin III which, AFAIK is updated as often as PostgeSQL itself.
5) You don't need the advanced features, until you use them and then you can't live without them. It's like addiction to drugs :-) - I moved from MySQL to Postgres back in the days (end of 2001) for a large project after spending hours and hours of hair-pulling due to the lack of subqueries and views in MySQL and since then, I never wanted to go back.
We're using PostgreSQL as the backend database for multiple large e-commerce applications. Some tables are over 20G in size (1.5 billion rows in one of them). PostgreSQL is easily handling ~60 queries per second on a quite dated dual core box with 4 GB of RAM.
You might have had a lot of success with MySQL. I on the other hand had a lot of it with PostgreSQL.
I don't know about the later versions of MySQL but when I moved to PostgreSQL, it was years ahead of MySQL feature-wise and as I never had any performance or scalability issues, I never felt the need to go back or even see if MySQL has caught up already.
Just wanted to point out a few differences to people hitting PGSQL after getting their feet wet with MySQL.
1. While installing PGSQL get your configuration variables correct, or it wont work out-of-the-box (kernel.shmmax, autovacuum, shared_buffers)
2. learn about authentication. PGSQL depends on authentication of the OS ("sameuser" on pg_hba.conf). MySQL behaves by default as the "password" authentication method of PGSQL.
3. Learn about template databases. Your first database is a template database.
4. PGSQL has "pg_" databases and slash commands (\dt, \connect, \q) for system administration.
get your configuration variables correct, or it wont work out-of-the-box (kernel.shmmax, autovacuum, shared_buffers)
The only thing you really need to worry about is making sure the kernel SysV limits are appropriate (kernel.shmmax); autovacuum and shared_buffers are configured reasonably out of the box.
GSQL depends on authentication of the OS ("sameuser" on pg_hba.conf)
That depends on how PostgreSQL is packaged by your OS.
Learn about template databases. Your first database is a template database.
No; the default database is called "postgres", and it is not a template database.
I use ubuntu all the time, I don't remember having to configure shmmax to get it working with my blog. I think you only need to mess with that stuff if you're trying to increase the number of connections or "scale" your blog. I think 90% of the people who use mysql, use it without touching a configuration variable, and those people could just as easily use postgresql without configuring kernel.shmmax.
I don't know, but I'm guessing that 90% includes people messing around with django/rails/lift etc, or they're running a blog/cms/wiki on some shared virtual hosting somewhere.
the configuration that comes out-of-the-box actually is correct in a sense that it works. shared_buffers is low enough so that kernel.shmmax and kernel.shmall don't need to be changed.
Of course that means that there isn't enough shared memory available to achieve maximum performance, but chances are that you don't need to worry about that while you are still a beginner.
Later, when the amount of data grows, then it's time to learn about that stuff and this is when you will change shared_buffers - and the kernel resources with it.
Agreed about the other stuff, though it has pg_-tables, not databases. And having database metadata exposed in database tables is quite the common case and has indeed been made into a standard, though not in the form of pg_* but in form of the information_schema (though pg_* still expose a lot of additional information)
> 2) MySQL is used and proven to scale on very large installments (Yahoo, Facebook, Google - just to name a few). Postgre's usage is much lower (Skype and Reddit from those I know of)
We use Postgres (as well as MySQL) extensively at Last.fm, including on our terabyte+ main database handling many thousands of transactions per second. Except for the lack of decent replication, I have nothing bad to say about it.
At a large scale, it's as easy as MySQL to tune (if not easier).
We use Slony to replicate read-only slaves for portions of our database (I wouldn't recommend it, Londiste is probably better), and Postgres warm standby for HA.
> The advances features of Postgres are not that needed in most web applications. If they were that needed, more people would have used Postgres.
I am not so sure. Many people just start with Mysql (with MyIsam) and realize that they have a pile of transactionless FAIL on their hands only at some later date, without really going through a selection process.
Regarding the performance, has anyone actually done an apples-to-apples benchmark lately? Say PG 8.3 vs Mysql with InnoDB?
In any case, this particular comparison is not really worthwhile... it's full of hyperbole: "blazing fast", "rocks", "aeons" and so on. Doesn't sound all that professional.
Indeed. I have seen many "transactional" web applications that lose data because of MySQL's policy of handling errors by ignoring them, among other problems. (0000-00-00 as a date is just the tip of the iceberg.)
I even know of self-proclaimed "Postgres experts" whose applications lose data with Postgres due to its very "liberal" default isolation level. (MySQL has the same problem, of course, except you can't even fix MySQL with a config setting.)
Postgres defaults to READ COMMITTED transaction isolation, where it is possible that if you perform the same SELECT twice you will get different answers due to a transaction committing in between. This can affect updates in some circumstances, although if you build your database so that everything is done with INSERTs then it's pretty harmless.
Postgres permits you to specify the SERIALIZABLE transaction level which is complete isolation, if you care about this. MySQL does not.
The important thing is to understand your database's concurrency model. They don't all work the same way. This causes problems e.g. for SQL Server developers who are new to Oracle, or vice versa, and naively assume that everything is the same.
Postgres defaults to "read committed" isolation, rather than repeatable read or serializable isolation. I have mostly noticed the difference when dealing with updating HTTP session objects in a transaction; with "read commited" isolation, concurrent requests can overwrite newly-written data. With serializable isolation, the transaction that tries to overwrite new data with older data (read at the beginning of the request) is aborted.
(The fix for this is difficult, but I basically don't use the session for much transient data anymore, so this became less of an issue. Before setting up the database to be strict, I never even thought of this as a problem. But obviously it is.)
I wouldn't say it is "obviously" a problem: READ COMMITTED are perfectly reasonable semantics, and are preferable to many applications (e.g. many apps aren't written to retry transactions that abort due to serialization conflicts). Which isolation level ought to be the default is a matter of debate, but I don't think using RC by default is "obviously" wrong -- if an application depends on a particular isolation level for correctness, it should set it explicitly in any case.
Just remember that with read committed isolation, certain access patterns can silently trash your data.
People are afraid of transactions aborting for some reason, as they think that's an error. It's not; you just retry your transaction, which will now be using correct data. "Read committed" opens you up to race conditions in exchange for transactions rarely (never?) aborting themselves.
I think the default should be "never trash my data without telling me".
> 2) MySQL is used and proven to scale on very large installments (Yahoo, Facebook, Google - just to name a few). Postgre's usage is much lower (Skype and Reddit from those I know of)
I hadn't heard that Facebook had any major Postgres deployment. MySQL seems to form the core data storage for the appplication. Cassandra looks like something they will be using more and more of. They use their own Hive project to allow use of SQL to analyze data in Hadoop.
As for other big users of Postgres, Hi5 is a decent sized social net that uses Postgres. NTT uses postgres, and I think the coming simple replication and hot-standby features are based on code they developed for in-house use.
Isn't Greenplum based upon Postgresql? They claim a fairly impressive customer list.
In my experience, it's a major type pain once you realize MySQL doesn't do stuff that you need. The other differences are pretty marginal (save for replication maturity) but when you need some more complicated queries and don't have them, you're in a bind and you're double screwed if you've actually embraced some of the MySQL niceties. (It's not the end of the world but it can be a deceptively large task to migrate and your customers don't see anything for it)
On the down side, what I'm surprised nobody as mentioned is Postgresql's ever changing on disk format and the process of actually backing it up and restoring it. Just backing up the on disk files without a dump can often result in trouble. Dumping can be a heavy task too. It should be a rare thing to upgrade your database in production but Postgresql does add performance and some nice features on a regular basis.
Having dealt with many MySQL installations and, finally, a nice, internet-scale Postgres site, I can quite comfortably challenge the notion that MySQL has "great" replication.
It's certainly tough to beat for trivial topology or trivial database size, but, beyond that, MySQL statement-based replication is remarkably fragile. What's worse is, if a slave is far enough behind, it's possible that it will never catch up (hint: think about I/O contention with replication and whether it's possible to separate logs adequately). Once that happens, it's full dump time. Anything remotely fancy, like filtering out certain schemas (which MySQL calls "database," but it's really just a namespace), and a single developer discovering cross-schema UPDATEs, and That's All She Wrote.
With Slony, after learning the admittedly bizarre administrative interface, everything was a dream. Swapping master and slave is fast and easy, even if there are other slaves out there. Full dump? Never heard of it! Certainly, slony does do the equivalent when bringing up a new slave, but the locking requirement is nowhere near as onerous, nor is the overall performance hit. Replication happens at the table level, not the whole database.
The amount of time I've spent fixing broken MySQL replication easily dwarfs the amount of time I've spent learning and experimenting and fixing (after breaking on purpose, usually) Slony.
Well, let us put it straight! GitHub uses MySQL. In fact, these guys maybe helped install the new GitHub at Rackspace's data center, but their choices are irrelevant to GitHub choice of platform (the title is bit misleading though).
AS for the subject itself, I guess, if you are more comfort with either one of them, this one would be the best choice for you.
If you never used anyone of them and wish to start a new project which requires an RDBMS, you may search for other comparisons at Google (as well as look at here: http://news.ycombinator.com/item?id=184328)
Most important, no matter what language you are using, make sure to use a database abstraction library, so to make it easy to switch between them in case you are not satisfied with your first choice.
It does, since version 7. MySQL also does under InnoDB or Falcon. Other DB with MVCC are BDB, Firebird, InterBase, SQL Server, Sybase, ZODB, CouchDB or ObjectStore.
If you need more background, Oracle "still" hasn't made any press releases as to the fate of MySQL. This should be a very big "con", with a huge question mark next to it. Many core MySQL developers aren't being too optimistic about the future.
Oracle has owned the important parts of MySQL (InnoDB/InnoBASE) for a few years now, and that hasn't stopped them from fixing bugs and adding features in InnoDB.
Oracle can't do much except at worst, drag a stifling cloud over its future for as long as possible. If they actually stop or stall it, some fork will grab its momentum and run.
Well, it is open source, but there's definitely a question mark as to how many people will be paid to work on it, and what direction they'll be taking it.
I was happy with MySQL except Master-slave configuration. Now that Oravle has aquired MySQL I am looking for another open source DB and Postgres seems a perfect fit.
Instead of complaining about invalid input data, MySQL accepted the data and truncated it at the first invalid byte. Without error or warning.
Even if this was due to a misconfiguration on my part, behavior like this is completely unacceptable.
http://www.gnegg.ch/2008/02/failing-silently-is-bad/
and don't get me started on other comparable issues like truncating overlong texts added to varchar fields, or accepting invalid dates and silently converting them to 0000-00-00 shudder