I had a lot of trouble learning to solve 2 equations in 2 variables because I did not see the point. Given any word problem that you were supposed to solve that way, I could solve it in my head. It wasn't until I was shown that I couldn't solve 3 equations in 3 variables that I realized that I needed to learn the boring way.
Secondly the single most useful thing that I did in school was try to generate a table of how likely it was to get various dice rolls when you rolled 4 6-sided dice and took the top 3. I learned a lot from that, and that sparked my interest in math.
Thirdly my biggest complaint about the way we teach stuff is that we present matrices and matrix multiplication with no context. It makes no sense to people. But if you know what a linear function, and realize that a matrix is just a way to write one down, then matrix multiplication turns out to be just function composition.
Just think how surprising the associative law is for matrix multiplication. I remember sitting there thinking, "How on Earth did anyone think it up, and see the associative law?" It becomes something you memorize because it makes no sense.
But the associative law always holds for function composition. Given three functions f, g, h and a thing they act on v, then by definition:
((f o g) o h)(v) = (f o g)(h(v)) = f(g(h(v))) = f((g o h)(v)) = (f o (g o h))(v)
Since matrix multiplication is just a way to write out function composition for a certain class of functions, it likewise must follow the associative law. THAT is how they thought it up!
I can hear the complaints already. "Oh, but this is too abstract for college students, they can't possibly understand this approach!" Bull. They can, and they do if you have the guts to present it this way. I've done it, with success.
Start with linear transformation F from vector space V to vector space W, each of which has a basis (v1, ..., vn) and (w1, ..., wn). The matrix M representing F in that pair of bases is (F(v1) F(v2) ... F(vn)) where each vector is written as a column.
Given those bases it is easy to demonstrate that every linear function can be uniquely represented that way.
Thanks to the properties of linearity, it is easy to demonstrate that the special case of matrix multiplication of a matrix against a column is the same as applying that linear function to the corresponding vector. Furthermore you can demonstrate that given the basis and the matrix, you have actually defined a linear function. (Therefore completing the demonstration that matrices are a notation for linear functions, and linear functions are what matrices represent.)
With that in mind the matrix representing (F o G) is going to be ((F o G)(v1) ... (F o G)(vn)). And when you unwind that definition you find that function composition turns into matrix multiplication. (As long as all of the bases match up of course, don't forget them!)
At this point you now have a rule for matrix multiplication. Thanks to the correspondence to linear functions, you can derive all of its algebraic properties (including associativity) from the corresponding properties of linear functions.
Incidentally by keeping track of the role of the basis throughout the presentation, you make it much easier to work out change of basis matrices later. Which has a lot of potential to be confusing because they work out to be the inverse of what you'd naively guess them to be. For instance if you rotate your basis 30 degrees clockwise, the change of basis matrix you get is a rotation 30 degrees counter-clockwise. (This happens for the same reason that while you spin clockwise, it looks to you like the world is spinning counter-clockwise.)
So how do you get the change of basis matrix? Well, go back to the definition. Make your function be the identity (everything remains the same, and then you just write out a matrix which has each column being, in the new basis, the coordinates of the basis vectors for the old basis.
Now an exercise to demonstrate to yourself that you really understood this. Let V be the vector space of polynomials of degree at most 2, and W be the vector space of polynomials of degree at most 1. Let F be the linear function called "differentiation". Start with a coordinate system on V which is just (p(0), p(1), p(2)) and a corresponding coordinate system on W which is just (p(0), p(1)). In that pair of coordinate systems, what matrix represents F?
If you can figure that out, you probably understood the whole thing. If not, well...
(Big hint. There is a different pair of coordinate systems in which you can easily write down the answer. Use that fact...)
Thanks for the detailed explanation. I was subconsciously thinking "Matrix multiplication is this weird operation, which happens to be isomorphic to function composition in the space of linear transformations." rather than "Function composition, when functions are represented as matrices, is called matrix multiplication."
“… try to generate a table of how likely it was to get various dice rolls when you rolled 4 6-sided dice and took the top 3.”
I felt the urge to code this. Here is the result: https://gist.github.com/2899137. It doesn’t tell you “likelihood of various dice rolls”; it can either print out the rolls for each trial or tell you how common each of the 6 numbers were in all the rolls.
That’s true. But I thought doing it by hand would require writing a tediously large table because you have 6^3 possible roll results to give the probability of, if you were actually going to write the “likelihood of various dice rolls”. I suppose the appropriate compromise is a symbolic manipulation program like Mathematica, which can work with exact numbers easily while automating the creation of the table. (If anyone can explain the problem, it would be great if they could link to a document demonstrating the solution on somewhere like http://www.mathics.net/ .) Or is there an easier, simpler way to solve this by hand?
I had a lot of trouble learning to solve 2 equations in 2 variables because I did not see the point. Given any word problem that you were supposed to solve that way, I could solve it in my head. It wasn't until I was shown that I couldn't solve 3 equations in 3 variables that I realized that I needed to learn the boring way.
Secondly the single most useful thing that I did in school was try to generate a table of how likely it was to get various dice rolls when you rolled 4 6-sided dice and took the top 3. I learned a lot from that, and that sparked my interest in math.
Thirdly my biggest complaint about the way we teach stuff is that we present matrices and matrix multiplication with no context. It makes no sense to people. But if you know what a linear function, and realize that a matrix is just a way to write one down, then matrix multiplication turns out to be just function composition.
Just think how surprising the associative law is for matrix multiplication. I remember sitting there thinking, "How on Earth did anyone think it up, and see the associative law?" It becomes something you memorize because it makes no sense.
But the associative law always holds for function composition. Given three functions f, g, h and a thing they act on v, then by definition:
((f o g) o h)(v) = (f o g)(h(v)) = f(g(h(v))) = f((g o h)(v)) = (f o (g o h))(v)
Since matrix multiplication is just a way to write out function composition for a certain class of functions, it likewise must follow the associative law. THAT is how they thought it up!
I can hear the complaints already. "Oh, but this is too abstract for college students, they can't possibly understand this approach!" Bull. They can, and they do if you have the guts to present it this way. I've done it, with success.