Spare transforms like this are very interesting, and these seem like some very good performance results. It's worth noting, though, that FFTW (http://www.fftw.org/) is still much faster on the sizes of transforms that many (most?) applications do - less than 2^15 or so entries. Also, as the number of non-zero frequencies climbs, sparse algorithms very quickly get overtaken by good implementations of more standard FFT algorithms.
They actually point out a specific comparison to FFTW in the paper: their "preliminary implementation" is faster than FFTW for n=2^22 and k<=2^17, and they point out that this performance is a distinct advantage over previous algorithms, which were faster for only k<=2000 or so. In other words, it seems that this algorithm is faster than FFTW for a much larger number of non-zero frequencies than was previously possible.
I don't know how to estimate what values of n would be relevant for today's applications, but n=2^22≈4M doesn't seem like such a big number for images or video (if signal length roughly corresponds to number of pixels). Any thoughts?
> I don't know how to estimate what values of n would be relevant for today's applications, but n=2^22≈4M doesn't seem like such a big number for images or video (if signal length roughly corresponds to number of pixels). Any thoughts?
Transform codecs for video typically don't transform an entire frame at a time, but rather blocks of the frame. This is for three reasons. The first is efficiency - most of these codecs date back to a time when huge transforms (FFT, DCT, etc) were very expensive. Possibly the most important is the local artifacts are less noticable than global artifacts in video, and using blocks is a nice way to bound that. Finally, the point of using a transform is to be able to extract some 'structure' from the data that can be used to encode it in a more efficient way. Small blocks make this easier to do in a perceptually acceptable or transparent way.
There are absolutely applications for large transforms, but video encoding typically isn't one. Incidentally, many modern approaches are based on other transforms, like the discrete wavelet transform, or approximations to transforms like the DCT.
Still, very cool stuff.