Singular Value Decomposition Part 2: Theorem, Proof, Algorithm

rrmm · on May 17, 2016

An excellent algorithm to know backwards and forwards.

It's one of those tools you always want to try first if you can assume a somewhat linear approximation will work.

thearn4 · on May 17, 2016

The fact that the SVD algorithm is backwards-stable is still somehow surprising to me, despite being very familiar with the proof. Very few algorithms that produce more data than they take as input have this property.

coherentpony · on May 17, 2016

> Very few algorithms that produce more data than they take as input have this property.

Although a useful metric, measuring only the amount of input and output misses other useful characteristics.

The SVD produces more output than input, but the output is very heavily constrained.

amelius · on May 17, 2016

This got me thinking. How many numbers do you need to describe any unitary matrix of size N?

mathpepe · on May 17, 2016

In the case of matrices with real coefficients, these matrices constitute the famous ortogonal group. The ortogonal group is a Lie group of dimension n*(n-1)/2. A table of Lie groups with their dimensions is here https://en.wikipedia.org/wiki/Table_of_Lie_groups

Every Lie group has an associated vector space with the same dimension, so many question about Lie groups can be addressed by studying the associate vector space.

To suggest an application of Lie groups and their associated vector spaces, in machine learning one can reduce the number of parameters when a group act over a space (rotations and the like), so you obtain new features to train your machine in a more efficient way.

amelius · on May 17, 2016

In 2 and 3 dimensions the answer is exactly the amount of positions below the main diagonal. So in a SVD decomposition, the degrees-of-freedom actually add up nicely, i.o.w. there is no loss/creation of information. I'm guessing it generalizes to higher dimensions.

RaptorJ · on May 17, 2016

This might be a far out question, but is there a connection/analogy between the SVD approximation and a fourier transform+low-pass filter.

thearn4 · on May 17, 2016

It's true that a 2D discrete FFT can also be used to produce an orthogonal expansion of an input matrix, but a truncated reconstruction using it (like a band-pass filtering) won't have the rank properties (e.g. optimal low-rank approximation) of a truncated SVD-based approximation.

They are relatable in many ways if you are interested in data filtering and analysis (viewing matrices as data structures rather than as operators in a function space). Generally the SVD is nicer to have, but the FFT is easier to compute (O(n^3) vs O(n^2 log (n)).

Another way to think about them:

An FFT will describe your data in terms of a fixed orthonormal basis: sine and cosine functions of varying frequency.

An SVD will describe your data using a basis that is also orthonormal, but is entirely based on your data.

rrmm · on May 17, 2016

If you are interested in this stuff, check out Strang's Linear algebra course lectures (mit opencourseware iirc).

He goes through SVD and FFT in terms of matrices. Really illuminating course.

thomasahle · on May 17, 2016

This one? http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-... Do you know if there is a written version? Videos are sometimes a bit harder to follow at your own pace, I find.

rrmm · on May 20, 2016

Chapters from his textbook are available online.

thearn4 · on May 17, 2016

Nick Trefethen's "Numerical Linear Algebra" is another great source, and a pretty standard read in an upper-level undergraduate or graduate sequence.

mathheaven · on May 17, 2016

Just a little nickpick, reading the first lines, I would prefer the author to explain the Pythagorean theorem in R^n like this (is a very simple proof)

v . v = |v|^2, and u and v are orthogonal (u . v = 0) then |u+v|^2 = (u+v) . (u+v) = u.u + v.v = |u|^2 + |v|^2

A nice example of a projection, if v=(1,1,1,...,1) then the projection of u over v is mean(u)*v

shriphani · on May 17, 2016

What an incredibly well written tutorial! Do you plan to cover some of the work around thin/skinny SVDs?

thomasahle · on May 17, 2016

Is there an analog of SVD where you optimize the L1 / sum of absolute errors, rather than the squares?