These 3d perspective drawings are much less convincing in real life than they are in images, because in real life we have depth perception and our viewpoint is constantly moving. This only looks good from that exact viewpoint.
CAVE style VR suffers from the same problem if you don't want to use stereoscopic shutter glasses. You can track one user's head position and update the renderer's camera accordingly, but with multiple users you can't do that without shutter glasses.
Even with shutter glasses, I'd think there must be some limit on how many users you could have at once without irritating flickering; with 3 users and six different perspectives to render, both displays on an individual user's glasses would be opaque 2/3 of the time, and each eye would see an image for 1/6 of every unit time.
The driver's attention is not focused solely on the crosswalk, but on many different things at different times during the approach. Once the driver reaches the point where the brain is tricked most by the illusion, the attention will be on the crosswalk, with a greater probabilty. That's all what matters, these milliseconds where the brain perceives the illusion as an obstacle.