Perhaps the author is referring to a technique such as how a cache can be made c...

Perhaps the author is referring to a technique such as how a cache can be made concurrent [1]. In that case every read is a write, as LRU requires moving entries. Since an LRU is often implemented as a doubly-linked list, it either is a very complex lock-free algorithm or performed under a lock. The lock-free approach doesn't scale because all the recent entries move to the MRU position (tail), so there is contention.

The approach used in that article is to publish the events into multi-producer/single-consumer queues. When a queue is full, then the batch can be performed under a lock to replay and catch-up the LRU. This way the lock is used to ensure single-writer, but does not suffer lock contention. The ring buffer is lock-free, but the cache itself is not. Lock-free can suffer contention or be more expensive algorithmically than if under a lock, so whichever strategy chosen requires careful consideration of the system performance.

[1] http://highscalability.com/blog/2016/1/25/design-of-a-modern...