So, if I understand correctly, they have created an algorithm that takes several colored point clouds (or photos + depth maps) of a scene and can generate renderings of the scene from novel viewpoints (or with different camera parameters, e.g. exposure or white balance). The results look truly impressive!
The whole rendering pipeline is differentiable, so they can backpropagate the ground truth error to all the unknowns in the pipeline. Those include values like camera pose parameters, point cloud texture and point positions, weights of a neural network that turns (potentially sparse) point cloud rasterizations into full HDR (high dynamic range) images, and tone mapping parameters like exposure.
(If I understand correctly, they start from something like a SLAM-based point cloud, but I haven't found that explained in glancing over the paper.)
Nitpick for the paper: A brief note on what HDR and LDR mean would be useful. The acronyms are not even expanded anywhere.
As I'm interested in the subject, I immediately think about the possibility of point cloud tech in a game development context. As far as I know, the big hurdle has always been dynamism like animation/deformation. Does this approach have any influence on that aspect?