Does anyone knows of a good blog comparing pg against other major rdbms( oracle,...

ComodoHacker · on April 5, 2016

>why, for example pg is the only one in my knowledge having this vacuum issue

Because other DBMSs have different MVCC implementations. Basically they keep that "snapshot" row versions in other places, not in the table itself. For example, in Oracle they go to undo segment; in MSSQL to tempdb (or you can just disable MVCC); and in MySQL with InniDB to rollback segments similar to Oracle (not surprising :).

Other MVCC implementations have their own drawbacks of course.

dspillett · on April 5, 2016

> in MSSQL to tempdb (or you can just disable MVCC)

In fact you have to explicitly turn it on in MS SQL Server. Twice: enabling ALLOW_SNAPSHOT_ISOLATION at the database level and setting transaction isolation level to READ_COMMITTED_SNAPSHOT for your connection/transaction.

In MS SQL Server it is an optional enhancement (added in MSSQL2008 IIRC) that you chose to use if your use case fits it, rather than being the primary method as it is in postgres. A great many developers using MS SQL Server don't even know the option exists.

insulanian · on April 5, 2016

> A great many developers using MS SQL Server don't even know the option exists.

Yes, default behavior is so prone to locks - I learned it the hard way. Switching READ_COMMITTED_SNAPSHOT ON brought new life to our DB.

chris_wot · on April 5, 2016

You really want to ensure you've setup your disks very carefully if you have a database that you are using snapshot isolation on.

TempDB runs the following across all databases on SQL Server. I'll just quote Microsoft directly here:

  The tempdb system database is a global resource that is available to all
  users connected to the instance of SQL Server and is used to hold the
  following:

  Temporary user objects that are explicitly created, such as: global or
  local temporary tables, temporary stored procedures, table variables, or
  cursors.

  Internal objects that are created by the SQL Server Database Engine, for
  example, work tables to store intermediate results for spools or
  sorting.

  Row versions that are generated by data modification transactions in a
  database that uses read-committed using row versioning isolation or
  snapshot isolation transactions.

  Row versions that are generated by data modification transactions for
  features, such as: online index operations, Multiple Active Result Sets
  (MARS), and AFTER triggers. [1]

...and the following article [2] also notes that tempdb also handles "Materialized static cursors". The "internal objects" include:

  Work tables for cursor or spool operations and temporary large object
  (LOB) storage.

  Work files for hash join or hash aggregate operations.

  Intermediate sort results for operations such as creating or rebuilding
  indexes (if SORT_IN_TEMPDB is specified), or certain GROUP BY, ORDER BY,
  or UNION queries. [3]

... and yet another article [4] explains that tempdb is used:

  To store intermediate runs for sort.

  To store intermediate results for hash joins and hash aggregates.

  To store XML variables or other large object (LOB) data type variables.
  The LOB data type includes all of the large object types: text, image,
  ntext, varchar(max), varbinary(max), and all others.

  By queries that need a spool to store intermediate results.

  By keyset cursors to store the keys.

  By static cursors to store a query result.

  By Service Broker to store messages in transit.

  By INSTEAD OF triggers to store data for internal processing.

In other words, if you run snapshot isolation, it's possible that someone running a query that uses a largish temporary table or cursor can cause disk contention that will affect snapshot isolation. Similarly, if you run a largish query - or many queries for that matter - that involves a query where your plan shows a sort or spool (several join operators can cause this) then these can affect snapshot isolation also.

SQL Server is honestly the only database I know that puts all these operations into a single shared resource database. Oracle allows you to hive this sort of stuff off to other tablespaces and you can reconfigure and tune your disks to your hearts content.

This has been a known issue for a long time by Microsoft and pretty much any serious SQL Server DBA. You don't have to take my word for it, take a look at the following articles that go into a lot of detail about how to handle tempdb:

* Optimizing tempdb Performance (MSDN) explains some strategies for configuring tempdb as "the size and physical placement of the tempdb database can affect the performance of a system" [5] - I strongly recommend reading this article when you setup a new database server or have an opportunity to do serious database maintenance that allows you to reconfigure you disk setup

* Capacity Planning for tempdb [3] - actually, definitely read this one as it gives a comprehensive list of things done in the tempdb

* Working with tempdb in SQL Server 2005 [4] - yeah, it mentions SQL Server 2005, but I think a lot of it still applies

* Recommendations to reduce allocation contention in SQL Server tempdb database [6] - the symptom is:

  You observe severe blocking when the SQL Server is experiencing heavy
  load. When you examine the Dynamic Management Views [sys.dm_exec_request
  or sys.dm_os_waiting_tasks], you observe that these requests or tasks are
  waiting for tempdb resources. You will notice that the wait type and wait
  resource point to LATCH waits on pages in tempdb. These pages might be of
  the format 2:1:1, 2:1:3, etc.

And the cause is:

  When the tempdb database is heavily used, SQL Server may experience
  contention when it tries to allocate pages. Depending on the degree of
  contention, this may cause queries and requests that involve tempdb to be
  unresponsive for short periods of time.

1. https://msdn.microsoft.com/en-us/library/ms190768.aspx

2. https://support.microsoft.com/en-us/kb/307487

3. https://technet.microsoft.com/en-us/library/ms345368(v=sql.1...

4. https://technet.microsoft.com/en-us/library/cc966545.aspx

5. https://technet.microsoft.com/en-us/library/ms175527(v=sql.1...

6. https://support.microsoft.com/en-us/kb/2154845

bsaul · on April 5, 2016

""Other MVCC implementations have their own drawbacks of course.""

Thanks for the info, can you elaborate on the drawbacks of other technics ?

ComodoHacker · on April 5, 2016

I can for Oracle, which I'm most familiar with. Its approach adds much of complexity both in implementation and configuration/administation. DB engine must ensure integrity and consistency of data in different DB files in scenarios like backup, restore, crash recovery, DDL etc. DBA have to monitor and tune undo space according to database growing rate and workload. If not done properly, long-running SELECT can end up receiving "Snapshot too old" error, which means DB engine could not reconstruct all row data needed as seen at certain point of time.

EmielMols · on April 5, 2016

Rollbacks (canceling a transaction) on other systems usually take longer because it physically needs to rewrite the "undo log" back to the data pages.

ComodoHacker · on April 5, 2016

Yes, but usually this is not a problem, because rollbacks are rare in production.

AlterEgo20 · on April 6, 2016

Even a single rare rollback can kill a prod DB. We had a situation when a bug in our code caused a series of huge UPDATEs in Oracle DB and LOCKed some critical tables. After ~2 hours we found this session and killed it.

The problem - we had to wait for 2 more hours with dead prod while Oracle was rolling back everything. In Oralce killed sessions still hold their locks while rolling back.

In Postgres rollback is almost instant.