The fact that the SVD algorithm is backwards-stable is still somehow surprising to me, despite being very familiar with the proof. Very few algorithms that produce more data than they take as input have this property.
In the case of matrices with real coefficients, these matrices constitute the famous ortogonal group. The ortogonal group is a Lie group of dimension n*(n-1)/2. A table of Lie groups with their dimensions is here https://en.wikipedia.org/wiki/Table_of_Lie_groups
Every Lie group has an associated vector space with the same dimension, so many question about Lie groups can be addressed by studying the associate vector space.
To suggest an application of Lie groups and their associated vector spaces, in machine learning one can reduce the number of parameters when a group act over a space (rotations and the like), so you obtain new features to train your machine in a more efficient way.
In 2 and 3 dimensions the answer is exactly the amount of positions below the main diagonal. So in a SVD decomposition, the degrees-of-freedom actually add up nicely, i.o.w. there is no loss/creation of information. I'm guessing it generalizes to higher dimensions.
It's true that a 2D discrete FFT can also be used to produce an orthogonal expansion of an input matrix, but a truncated reconstruction using it (like a band-pass filtering) won't have the rank properties (e.g. optimal low-rank approximation) of a truncated SVD-based approximation.
They are relatable in many ways if you are interested in data filtering and analysis (viewing matrices as data structures rather than as operators in a function space). Generally the SVD is nicer to have, but the FFT is easier to compute (O(n^3) vs O(n^2 log (n)).
Another way to think about them:
An FFT will describe your data in terms of a fixed orthonormal basis: sine and cosine functions of varying frequency.
An SVD will describe your data using a basis that is also orthonormal, but is entirely based on your data.
Just a little nickpick, reading the first lines, I would prefer the author to explain the Pythagorean theorem in R^n like this (is a very simple proof)
v . v = |v|^2, and u and v are orthogonal (u . v = 0) then
|u+v|^2 = (u+v) . (u+v) = u.u + v.v = |u|^2 + |v|^2
A nice example of a projection, if v=(1,1,1,...,1) then the projection of u over v is mean(u)*v
It's one of those tools you always want to try first if you can assume a somewhat linear approximation will work.