I've been waiting for this for 6th months. Thanks to Sheldon Axler for making it...

ndriscoll · on Oct 29, 2023

The most important interpretation IMO is that a matrix is a specification for a linear map. A linear map is determined by what it does to a basis, and the columns of a matrix are just the list of outputs for each basis element (e.g. the first column is `f(b_1)`. The nth column is `f(b_n)`). If A is the matrix for f and B the matrix for g (for some chosen bases), then BA is the matrix for the composition x -> g(f(x)). i.e. the nth column is `g(f(b_n))`.

The codomain of f has to match the ___domain of g for composition to make sense, which means dimensions have to match (i.e. row count of A must be column count of B).

Buttons840 · on Oct 29, 2023

It's debatable what the "most important" perspective is. For example, if I need a bunch of dot products between two sets of vectors, that doesn't seem like a linear map or a change of basis (not to me at least), and yet it's exactly what matrix multiplication is, just calculating a bunch of dot products between two sets of vectors.

Or when I think about the singular value decomposition, I'm not thinking about linear maps and change of basis, but I am thinking about a sum of many outer product layers.

ndriscoll · on Oct 30, 2023

If you don't have a linear map in mind, why do you write your dot products with one set of column vectors and another set of row vectors? Computationally, the best way to do dot products would be to walk all of your arrays in contiguous memory order, so the row/column thing is an unnecessary complication. And if you have more than 2 matrices to multiply/"steps of dot products to do in a pipeline", there's almost certainly a relevant interpretation as linear maps lurking.

Outer products are one way to define a "simple" linear map. What SVD tells you is that every (finite dimensional) linear map is a sum of outer products; there are no other possibilities.

senderista · on Oct 30, 2023

This is great! I eventually figured out all these interpretations as well, but it took forever. In particular, the "sum of outer products" interpretation is crucial for understanding the SVD.

adrian_b · on Oct 30, 2023

The "sum of outer products" interpretation (actually all these interpretations consist in choosing a nesting order for the nested loops that must be used for computing a matrix-matrix product, a.k.a. DGEMM or SGEMM in BLAS) is the most important interpretation for computing the matrix-matrix product with a computer.

The reason is that the outer product of 2 vectors is computed with a number of multiplications equal to the product of the numbers of elements of the 2 vectors, but with a number of memory reads equal to the sum of the numbers of elements of the 2 vectors.

This "outer product" is much better called the tensor product of 2 vectors, because the outer product as originally defined by Grassmann, and used in countless mathematical works with the Grassmann meaning, is a different quantity, which is related to what is frequently called the vector product of 2 vectors. Not even "tensor product" is correct historically. The correct term would be "Zehfuss product", but nowadays few people remember Zehfuss. The term "tensor" was originally applied only to symmetric matrices, where it made sense etymologically, but for an unknown reason Einstein has used it for general arrays and the popularity of the relativity theory after WWI has prompted many mathematicians to change their terminology, following the usage initiated by Einstein.

For long vectors, the product of 2 lengths is much bigger than their sum and the high ratio between the number of multiplications and the number of memory reads, when the result is kept in registers, allows reaching a throughput close to the maximum possible on modern CPUs.

Because the result must be kept in registers, the product of big matrices must be assembled from the products of small sub-matrices. For instance, supposing that the registers can hold 16 = 4 * 4 values, i.e. the tensor/outer product of 2 vectors of length 4, each tensor/outer product, which is an additive term in the computation of the matrix-matrix product of two 4x4 sub-matrices, can be computed with 4 * 4 = 16 fused multiply-add operations (the addition is used to sum the current tensor product with the previous tensor product), but only 4 + 4 = 8 memory reads.

On the other hand, if the matrix-matrix product were computed with scalar products of vector pairs, instead of tensor products of vector pairs, that would have required twice more memory reads than fused multiply-add operations, therefore it would have been much slower.