Looks like the AutoScaling groups are applications or stores where they do not need to coordinate their actions. The 3-az deployments seems to be APIs, which I am guessing scales with each regions (to reduce data cost?) and probably brought up/down automatically with Puppet to handle post-launch configurations. (you are reading it correctly).
I would guess testing is a skeleton version of the entire deployment so the cost is minimal and just need to test new deploys and verification for tests.
Staging probably wasn't a full mirror, at best I would venture to guess they had hot swaps coming up in staging and then being switched against production via ELBs.
They mention costs a few times in articles, so I would venture to guess they did optimize around many of those corners.
I would guess testing is a skeleton version of the entire deployment so the cost is minimal and just need to test new deploys and verification for tests.
Staging probably wasn't a full mirror, at best I would venture to guess they had hot swaps coming up in staging and then being switched against production via ELBs.
They mention costs a few times in articles, so I would venture to guess they did optimize around many of those corners.