This was called frequency clamping in the book 'advanced renderman' which talks a lot about procedural textures.
A simple way to think about it is to imagine a pattern of thin black and white stripes. If you go far enough away from the pattern, there will be multiple black and white stripes in the same pixel. When they are smaller than a pixel the average color will be grey. Knowing this, you can fade to grey as the stripes get tiny instead of arriving at grey from heavy sampling.
This is awesome! This reminds me of MinBLEP audio synthesis of discontinuous functions (https://www.cs.cmu.edu/~eli/papers/icmc01-hardsync.pdf). Instead of doing things at high sampling rate and explicitly filtering, generate the band-limited version directly.
In the article, talking about smoothstep approximation of sinc: "I'd argue the smoothstep version looks better" Why would this be? I would have thought the theoretically correct sinc version would look nicer.
sinc is perfect if you are looking only at frequency response. But in images, you also want to preserve locality, that is, processing of one part of the image should not affect the rest of the image. For example, sharpening an edge should only affect the edge in question, not its surroundings. It comes in contradiction with the idea of preserving frequency response. Frequencies are about sine waves, and sine waves are wide, infinitely wide in fact.
BTW, that's also the reason why in quantum mechanics you can't know both position and momentum (a frequency) precisely.
So we need to compromise, and like in scaling algorithms, you have 3 qualities: sharpness (the result of a good frequency response), locality and aliasing. You can't have all 3, so you need to pick the most pleasant combination.
The extreme cases are:
- Point sampling: excellent locality and sharpness, terrible aliasing
- Linear filtering: excellent locality and no aliasing, very blurry
- sinc filtering: excellent sharpness and no aliasing, terrible locality (ringing artefacts)
Using smoothstep is a good compromise, it has a bit of aliasing because it is a step function and it has a bit of smearing because it is smooth but none of the effects as so bad as to be unpleasant.
Side note: for audio, frequency response is more important than locality, that's why sinc window functions are so popular.
> "I'd argue the smoothstep version looks better" Why would this be? I would have thought the theoretically correct sinc version would look nicer.
For a fixed well-defined mathematical problem, you might be able to solve it optimally or approximately. One perspective is to treat the problem as given and immutable and then try to compute an exact or optimal solution.
But often the original problem statement is fairly arbitrary, based on a bunch of guesses or simplifications, and you might be able to get a better result by changing the problem definition (perhaps unfortunately making it much messier to solve exactly) and then solving the new problem statement approximately.
What's the actual problem we're trying to solve here? Generate something that looks visually pleasing. Why is an expression involving cosine the natural way to define that problem statement mathematically? There's likely a lot of freedom to here to vary our problem definition.
It might be interesting to start with the smoothstep multiplied result and take the derivative and look at how that differs from a normal cosine, and ponder why that might produce a more pleasing result than a cosine.
It seems like a theoretically correct box filter might not actually be the best filter to use? By approximating it you get a different filter, and whether it's a better filter is something you need to judge by looking at the result.
It looks like the sinc version is still adding a little bit of some higher frequencies (the dampened sine wave), and the approximation doesn't. Maybe those higher frequencies don't actually make things look better?
> In the article, talking about smoothstep approximation of sinc: "I'd argue the smoothstep version looks better" Why would this be? I would have thought the theoretically correct sinc version would look nicer.
In this case we are sort of mimicking the eye. The eye doesn't do sinc-bandlimiting (it does a sort of angular integration -- it sums the photons received in a region).
I say "sort of", because we're really doing two steps: first we are projecting a scene into a screen, and then the eye is viewing the screen. We want (in most cases) that what the eye sees in the screen corresponds to what it would see directly (if seeing the scene in reality).
The naive rendering approach simply samples an exact point for each pixel. When there's high pixel variation (higher spatial frequency than the pixel frequency), as you move the camera the samples will alternate rapidly which wouldn't correspond to the desired eye reconstruction. The eye would see approximately an averaged (integrated) color over a small smooth angular window.
Note we really never get the perfect eye reconstruction unless the resolution of your display is much larger than the resolution your eye can perceive[1]. But through anti-aliasing at least this sampling artifact disappears.
This window-integration is not an ideal sinc filtering! Actually it's not bandlimiting at all! since it is a finite-support convolution -- bandlimiting is just a convenient theoretical (approximate/incorrect) description.
In the frequency ___domain this convolution is not a box (ideal sinc filtering), it's smooth with ripples. In the spatial ___domain (that's really used here), it probably does look something like a smoothstep (a smooth window)[2]. The details don't matter if the resolution is large[3].
[1] Plus we would actually need to model other optical effects of the eye (like focus and aberration) that I won't go into detail :) But you can ask if interested.
[3] Because our own eye integrates the pixels anyway. Again this does ignore other optical effects of the eye (such as "focus" and aberration) that vary with distance to the focal plane, and more.
Just fixing a mistake: what I described is the window function, not the integrated (in this case, cosine) function that was used in the article. In this case there would still be ripples when applying the shown window function (in cosine integration). I do think ripple-free are probably better functions (or some faster decaying ripples) because of limited floating point precision generating artifacts (which can be seen in the center in the second demo).
> in theory, once per half-pixel, according to Nyquist
I'd have thought this should be once per two pixels instead. Nyquist says there's no aliasing between functions with wavelength L if the sampling at intervals of L/2. So sampling at once pixel should imply a 2-pixel wavelength minimum without aliasing. Assuming the author is right, what am I messing up?
But L/2 should be the sampling interval, which is fixed at 1 pixel. For example, a signal with a wavelength of 1 pixel (or 1/2 pixel) would be identical to a constant signal.
Would someone mind talking this down a bit? It seems like we are pre-dithering the textures so that, when rendered, the noise is less visible. Is that right?
- In raytracing, you're evaluating some complicated equation at each pixel ___location. In this case, there are some cosine components that have a really high spatial frequency, so you get that aliased TV-static-looking effect in some parts of the image
- One way to avoid that would be take many samples in a small region around each pixel ___location (at sub-pixel distances) which the author refers to as 'supersampling'. This would work except you'd need to raytrace a lot more points, which would slow down rendering
- What you could do instead would be (and this is what the post is mostly about) would be to replace the cosine(x) function with a function that is "the average value of cosine(x) from x-w/2 to x+w/2" - that's the big integral in the post. This function would effectively just be cosine(x) when x is much greater than w, but would average out the high-frequency cosine components of the image when x ~=< w
- The neat effect is that you can get the same smooth, alias-free image as you would with the expensive super-sampling operation just by using a modified version of cosine instead!
It's a blur operation. Blurs are often defined as transforming input function by convolving it with some kernel function (often a Gaussian, see the Wikipedia page on Gaussian blur, similar idea holds for blurring 1d, 2d, 3d, nd signals). Convolution is an operation to combine two functions by integrating then together in a particular way. This defines output at each ___location that's a weighted average of what values the input function took at neighbouring points to each given ___location. Integration provides a way of computing averages. The weights used in weighted average are defined by the choice of kernel function. In this case the kernel function is an indicator function that is 1 in some window and 0 everywhere else, instead of a Gaussian function that'd be used for a Gaussian blur.
Since the input image is defined as a continuous function that maps a coordinate to a colour (it isn't discretised into pixels) we can apply the blur operation to the input image function and figure out analytically for a new image function that directly evaluates the output smoothed colour value for any coordinate. Then we can just sample that function at each pixel.
Blurring removes or damps high frequency components of input leaving lower frequency components, so it can be used to remove or reduce high frequency "noise".
If you're familiar with Moire patterns and/or the Nyquist theorem, it's basically ensuring that we don't have too much information for the sampled channel (ie: the pixels that make up the line). The symptom of "overloading the channel" is shimmering and/or moire patterns -- the same sort of artifacts you'd get while trying to record high frequencies at a very low sampling rate.
You are “blurring” the samples because you know your function is NOT a step-function, essentially. So instead of trying to be “super precise” (you cannot), you “blur” your Fourier transform.
It is NOT that but my explanation is morally what you achieve.
It would be super great if those super short animations ran a simple forward-reverse loop instead of forward-forward. Would make it a lot easier to read for people with attention disabilities, while only making things more pleasant for all the normal folk out there.
If you are able, you can interact with it directly. If you go to https://www.shadertoy.com/view/3tScWd it's possible to use the mouse to move the vertical bar back and forth interactively.
A simple way to think about it is to imagine a pattern of thin black and white stripes. If you go far enough away from the pattern, there will be multiple black and white stripes in the same pixel. When they are smaller than a pixel the average color will be grey. Knowing this, you can fade to grey as the stripes get tiny instead of arriving at grey from heavy sampling.