The models do a pretty good job at rendering plausible global illumination, radiosity, reflections, caustics, etc. in a whole bunch of scenarios. It's not necessarily physically accurate (usually not in fact), but usually good enough to trick the human brain unless you start paying very close attention to details, angles, etc.
This fascinated me when SD was first released, so I tested a whole bunch of scenarios. While it's quite easy to find situations that don't provide accurate results and produce all manner of glitches (some of which you can use to detect some SD-produced images), the results are nearly always convincing at a quick glance.
As well as light and shadows, yes. It can be fixed explicitly during training like the paper you linked suggests by offering a classifier, but it will probably also keep getting better in new models on its own, just as a result of better training sets, lower compression ratios, and better understanding of the real world by models.
This fascinated me when SD was first released, so I tested a whole bunch of scenarios. While it's quite easy to find situations that don't provide accurate results and produce all manner of glitches (some of which you can use to detect some SD-produced images), the results are nearly always convincing at a quick glance.