I suspect many apps would benefit from splitting the commit completion notification into two pieces:
- commit logically committed (meaning it won't fail to complete for any reason other than power or hardware failure, and in particular no trigger or other such logic can keep it from completing)
- commit completed durably (i.e., all associated writes have reached stable, persistent storage, and future reads will see them even if there is a power failure in between)
A UI could indicate that a transaction is logically complete and have a second way to indicate that the transaction is durably committed.
IIRC, that doesn’t work in practice because once you have a synchronous commit requested, all existing async commits that preceded it must be guaranteed to be fsynced for that sync commit to be fsynced. Which kind of makes sense if you think about the later work potentially reading the result of the earlier one.
The high throughput gains if async commit shine when the totality of the workload is async.
I think you mean: If anybody anywhere decided to invoke a wait() until the state "my COMMIT; has been durably committed", then that means all previous commits must also be in durably committed state before we can continue.
Sure. But this doesn't affect other code that is merely wait()ing for 'logically committed'. Not in the past, and usefully, not in the future either. The 'logical system' can be far ahead of the 'durable' system, it doesn't have to wait.
Where it would go wrong is if code A needs to wait around for code B to do a thing (such as provide a value), and B's code for some reason contains a wait() on durable. Possibly because B was written before the split in commit behaviour was around, and for backwards compatibility reasons, for them COMMIT; means: "wait for durable commit".
Given that `SET synchronous_commit = off;` works per transaction, I'm kinda inspired here. I can think of a few places where we commit for various logical reasons but I don't need the fsync guarantee.
UI wise it does not make sense to have this distinction, as the window to get durability is a small fraction of a second. But for concurrent modifications the reduction in lock duration can mean an order of magnitude throughput.
If you're careful and you know what you're doing, sure. A much more impressive speed boost is to not use Aurora. "Let us separate storage and compute with a minimum latency of 1 msec," said no sane person ever.
Go spin up vanilla Postgres on an i4i instance, or even your personal computer, assuming you have a fast NVMe drive. You'll see what fast is. Nothing comes remotely close, not even io2 Block Express. A modern enterprise NVMe drive can do over 1,000,000 IOPS.
I didn’t know this could be per-session or per-transaction. That seems a way better way to take advantage of this for very carefully thought-through use-cases without setting up an invisible footgun in a global setting someone else doesn’t realize is changed from the default.
Thanks for sharing this! Like many others, we're at a scale where we use postgres for almost everything to reduce operational overhead, and there are some ETL,log and audit tables that would love a 30% speed bump for such a small risk window.
I'm not a Rails expert: does this reliably give you the same connection from the pool? If not, you could be setting `synchronous_commit = off` on one "physical" connection, and clearing it on another. This is setting it at the Postgres session level, so it will persist for the lifetime of the connection. If it's not managed reliably, it could lead to some very confusing bugs (only triggered in a very small window when Postgres shuts down unexpectedly while the transaction is still unsynced to durable storage, making it extra fun to debug).
Then, I suspect you may probably rather want to use `SET LOCAL` rather than just `SET`, so it applies to a single transaction only. So maybe something more like
def asynchronous_transaction(&block)
ActiveRecord::Base.connection.transaction do
ActiveRecord::Base.connection.execute("SET LOCAL synchronous_commit = off")
yield
end
end
Please note that I don't really know Rails (in particular, I've no idea whenever `Base.connection` is guaranteed to be exactly the same throughout this function's lifecycle or if e.g. there's a connection pool underneath), so I could be introducing some subtle bugs here.
Yes, the connection is "sticky" for the duration of the request. The middleware also automatically starts a transaction for you when you check out the connection.
Presumably if you disable this on a per-transaction basis you are still guaranteed that transactions are replicated in FIFO order? Otherwise you could end up in weird situations where a synced txn reads from a lost txn and your state is effectively corrupted.
Correct. Async commit transactions add their changes to the WAL the same way normal transactions do. The only relevant change is that the WAL is not synchronously flushed to disk before COMMIT completes.
Yes and no. On the primary durability order and visibility order are different. So an async transaction that starts committing later can become visible to readers before a sync transaction that comes before it.
A read-write sync transaction that reads the result of a non-durable transaction cannot commit ahead of the async transaction. But there is no such guarantee for read-only transactions. So a transaction could read a non-durable state, decide that no further action is needed, and have that decision be invalidated by a crash/failover.
To make things even worse, similar things can happen with synchronously replicated transactions. If a committing transaction waiting on replication is cancelled it will become visible immediately. A reasonably written application would run a retry of the transaction, see that the result is there and conclude that no action is necessary, even though the data is not replicated and would be lost if a failover then happens.
I'm not sure working in parallel is always a good decision.
An anecdote: just the other day I've had to implement batching instead of concurrent parallel processing because PostgreSQL really hated me having thousands of concurrent transactions on the same table.
My particular workflow was essentially this - I get a batch (full state dump) with some products, and I need to update my `products` table to keep track of them (soft-deleting what had disappeared, inserting new, updating existing):
BEGIN;
-- Quickly load the batch into a temporary table
CREATE TEMPORARY TABLE products_tmp (LIKE products INCLUDING ALL) ON COMMIT DROP;
COPY products_tmp FROM STDIN;
-- Soft-delete products missing from the current batch
UPDATE products SET is_active = FALSE WHERE is_active AND store_id = ANY($1) AND id NOT IN (SELECT id FROM products_tmp WHERE store_id = ANY($1));
-- Upsert products from the current batch (add new, update existing)
INSERT INTO products (...) SELECT ... FROM products_tmp ON CONFLICT (id) DO UPDATE SET ...;
COMMIT;
With just a few thousands of concurrent writers things started to look quite ugly, with constant serialization failures (I started at SERIALIZABLE, then downgraded to REPEATABLE READ, was reluctant to use READ COMMITTED) and deadlocks preventing me from performing some DDL (schema migrations) on the products table.
So I've started to batch those batches elsewhere and dump them at periodic intervals - and things started to look a lot better. Maybe that was a naive/bruteish approach and I should've done some parameter tweaking and/or fancy table partitioning or something else (idk) for congestion control instead, but at least it worked.
On client side, there is no explicit way to know which transaction is broken when there are multiple transactions undergoing. If your use case relies on transactional assumption, it may fail at some point.
On client side you know because you're setting the async-commit on a case basis.
All transactions work the same. When/If the server fails, some transactions (that you allowed yourself) may be not durable (or not replicated). But all things are still transactional.
- commit logically committed (meaning it won't fail to complete for any reason other than power or hardware failure, and in particular no trigger or other such logic can keep it from completing)
- commit completed durably (i.e., all associated writes have reached stable, persistent storage, and future reads will see them even if there is a power failure in between)
A UI could indicate that a transaction is logically complete and have a second way to indicate that the transaction is durably committed.