Much of the persistent world-state in Minecraft consists of blocks laid out on a 3D grid, which are probably fairly easy to optimise. Compare this to other 3D games where there are lots of entities with dynamic physics that can move to just about any world position, and I think Minecraft would be the easier of the two.
Also I'm not sure how Minecraft handles it (I assume it pretty much ignores it, or does something very naive), but in FPS games you need to take into account latency and do predictions on the client side to minimise it.
Think of two players running towards each other, and shooting at each other. By the time the player (or even the server) receives the data saying the other player has fired their gun, they've both moved to completely different positions.
(There is a good set of articles on this by a game developer, I can't find them now)
There was a bunch of Source-engine stuff Valve published, which is usually my go-to citation when arguing with players who don't understand "netcode" for a game but criticize it for not giving them perfect instantaneous communication anyway. :p