Hacker News new | past | comments | ask | show | jobs | submit | lcrs's comments login

If you're interested in the equivalent of "backprop through zemax" there are a few projects going on to jointly optimize optical designs with the image processing, e.g. check out: https://vccimaging.org/Publications/Wang2022DiffOptics/


I've been working on something similar, although I'm more interested in replicating the effects of existing lenses than designing new ones: https://x.com/dearlensform/status/1858229457430962318

PBRT 3rd edition actually has a great section on the topic but it's one of the parts that wasn't implemented for the GPU (by the authors, anyway): https://pbr-book.org/3ed-2018/Camera_Models/Realistic_Camera...


Colour in the Photoshop/gamedev world is often handled pretty casually, but if you're interested the moving picture world gets a lot more rigorous about it and there's tons of documentation around the ACES system in particular: https://github.com/colour-science/colour-science-precis https://acescentral.com/knowledge-base-2/

As you suggest storage in linear 16-bit float is standard, the procedure for calibrating cameras to produce the SMPTE-specified colourspace is standard, the output transforms for various display types are standards, files have metadata to avoid double-transforming etc etc. It is complex but gives you a lot more confidence than idly wondering how the RGB triplets in a given JPG relate to the light that actually entered the camera in the first place...


They also have lens sets which have the same external form factor regardless of focal length (i.e. makes it easy to swap, use same filters, etc.) and the lenses are made so the color reproduction of each one in a set is the same as well. And going further "to the source" it also plays into the (artificial) lighting used and so on. Which is why all that stuff is so expensive to begin with.


I've worked on a few bloody moments in film VFX and we certainly looked at some horrible reference of wounds, including headshots from pathology reports. I don't think I suffered much beyond the odd sigh but it was only for a couple weeks at a time. Someone went and talked to a trauma nurse about how much blood would be expected for various rounds and whatnot, and I did imagine them not being terribly impressed at why we wanted to know. Even though it was for a rather non-violence-glorifying film in that case.

At some point someone did tell me something that helped though: think of and describe what you're looking at using food terms instead of anatomy ones - talk about a blood fluid sim as being "too much like tomato juice not enough like gravy", textures in terms of mince/steak etc rather than human tissue types. I also labelled render passes like that, made conversations sound a lot less gross and more removed from reality, particularly to passers-by working on other shots...


A lot of that group's publications are listed here, many involving optical flow: http://web.archive.org/web/20161014025823/http://gpu4vision....

Maybe this one from 2007: https://www-pequan.lip6.fr/~bereziat/cours/master/vision/pap...


Oh amazing find Icrs, thank you! :)


Thomas Brox’s lab had a ton of these around 2008-2012 as well, such as [0]. I believe Brox had some freely available early CUDA program to calculate optical flow that was sort of SOTA for many years.

[0]: https://lmb.informatik.uni-freiburg.de/Publications/2010/Bro...


I'm actually not sure much ML was involved here - depends where you draw the line I guess, but denoising and interpolation for restoration typically use more traditional wavelet and optical flow algorithms. The work for this was done by Park Road Post and StereoD, which are established post-production facilities using fairly off-the-shelf image processing software. The colorisation likely leant heavily on manual rotoscoping, in the same way that post-conversion to stereo 3D does.

I'd love to hear otherwise but I'm not aware of any commercial "machine learning" for post-production aside from the Nvidia Optix denoiser and one early beta of an image segmentation plugin.


Huh, I recall seeing an article at one point (can't find the link) where it said or suggested that ML was involved. Of course this could have just been a journalist failing to make the distinction; I've seen everything from linear regression on up naively lumped into the ML bucket.

In any case the results are damned impressive -- can't say I've seen anything like it before.


Perceptual uniformity is in some ways opposite to the linearization suggested above - the L* component of CIELAB is much more like the gamma-encoded values of sRGB than a linear light measure.

It seems tough to come up with hard and fast rules for whether to mimic the linear physical processes, or work in a perceptual space more like the human visual system. I'd love to hear about more rigorous work in this area - most things I read have boiled down to "this way works better on these images".

It's interesting for example that using Sinc-type filters to resize truly linear data, like that from HDR cameras, usually gives rise to horrible dark haloing artifacts around small specular highlights, despite that being the most "physically correct" way to do it. Doing the same operation in a more perceptual space immediately sorts out the problem.


Digital cinema uses wavelet compression - intra-frame only JPEG2000 at hundreds of Mbps. It seems that at high resolution and bitrate it actually performs similarly to or better than h264, e.g. this paper and its references: http://alumni.media.mit.edu/~shiboxin/files/Shi_ICME08.pdf


Digital cinema uses a resolution that is much higher than H.265's targeted sweet spot. Their quality needs are also a lot higher. Motion compensated video cannot give them the desired quality. Hence intra frame only wavelet based compression. Also, note that JPEG2000 which uses wavelets implies that for still images, wavelets can be made to work better. JPEG which preceded JPEG2000 was 8x8 DCT based.


That's exactly how it's normally done - the offline editor works with low bitrate HD files, then gives the edit list to the online team who re-conform from the full resolution stuff.

There's always an impetus to get higher resolution and quality at the offline stage though, as people get used to improving technology. It wouldn't be outrageous to have a 4k offline now, because people with 4k TVs at home will say "Why can't we? I want to see everything!", even if it doesn't particularly enhance the editor's work.

It would still be done with lower bitrate files, because original 4k material from digital cinema cameras is beyond what most machines are happy with.


Similar results - converting to a square image at 8bit then lowering JPEG quality just gets noisy in an uninteresting way:

    sox mo.wav -e unsigned -b 8 -c 1 -r 48k mo.raw
    bytes=`stat -f %z mo.raw`
    width=`echo sqrt\($bytes\) | bc`
    square_bytes=`echo $width \* $width | bc`
    dd if=mo.raw of=mo_square.raw bs=$square_bytes count=1
    gm convert -depth 8 -size ${width}x${width} gray:mo_square.raw -quality 50 mo_square.jpg
    gm convert mo_square.jpg gray:mo_square_jpg.raw
    sox -e unsigned -b 8 -c 1 -r 48k -t raw mo_square_jpg.raw mo_jpg.wav


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: