The problem with 3D interfaces is not the 2D display.
There are two major problems with 3D interfaces: Your best input devices are fundamentally 2D. Fully 3D interface devices exist but are unusual and generally tuned for a specific purpose. (Even if you want to jump up and say "The Wiimote!", from what I can see it is typically used in a 2D manner. Those 2D are not the conventional 90 degree angles(), mostly in a way to make them easier to use with wrists and arms, but it's still a lot of 2D motion with rare forays into true 3D that get old fast.)
The other major problem is that full* 3D is virtually incomprehensible; we live in a 2.5D world. Try to train a "normal person" to play Descent. (If you think full-3D interfaces would be awesome, go play Descent to be sure!) We can't use the full 3D. What we can use is a 2.5D plane stretching out into the distance. But, as long as we're using 2.5D, why not place it perpendicular to our eyes and see the maximum area, instead the 3D-engine view that moves all but a very small part of the 2.5D plane out of our field of view and range of action?
Fully 3D interfaces are fundamentally and deeply flawed. We've had the hardware to display them for at least a decade now and there's a reason there isn't even a halfway decent prototype... and it's not for lack of trying. It's just a really, really bad idea, one where you can't even overcome the fundamental flaws with brute force and ideology.
(*): Recall that dimensionality is not constrained to the traditional three 90-degree rotated planes, it is a measurement of how many numbers need to be specified in a given situation. Hold out your arms and fingers straight, and move only your wrist around. As you move your wrist around, look at the surface defined by where the tip of your middle finger is. It is piece of a vaguely spherical shell which curves in 3D space, but is itself a 2D surface; given the situation I laid out I need only two numbers to identify a point on that plane. The Wiimote is capable of being used in full 3D and it certainly is in some cases, but the human body itself imposes constraints on how you move that thing around and full 3D gets tiring, fast. Even the ones that use full 3D still strike me as using 2.5D; a full 2D for the wrist motions, and .5D for just thrusting backwards or forwards with no meaningful wrist interactions (just serving as a tactile button). Another example: Using the Wiimote as a steering wheel, as nifty a use case as it is, actually cuts it down to 1D, as only the angle matters.
Excellent points, especially the part about the constrained degrees of freedom in our actual interactions with the world. To give another example: the human hand has somewhere upwards of 20 degrees of freedom. Yet except in a few situations that require lots of training (like touch typing or playing musical instruments) the movements of the hand are limited to a small number of stereotyped poses-- a power grip for holding a coffee mug or a suitcase, a pose bringing the thumb to forefinger, a handshake pose, a thumbs-up pose, a baseball grip, etc. And that's not even getting into the physical limitations of the hand.