> and the MAD is more intuitive, it has a natural geometrical interpretation It'...

relate · on Jan 16, 2014

This makes no sense. If you want to know how much they differ from the expected value, you first define what you mean by difference (i.e. L1 or L2 norm) and then measure it somehow. The standard deviation is an estimator for the square root of the expected value of this difference, when it's chosen to be the SQUARED L_1 distance (which is the same as the L_2 distance in 1-D). The MAD takes the mean of the L-1 distance.

While the standard deviation is proportional to the L_2 distance betwen the vector of the samples and a equal sized vector with all coordinates as the mean, that's not an intuitive expression based on the problem statement.

klodolph · on Jan 16, 2014

> that's not an intuitive expression based on the problem statement

That's the danger I'm talking about when you start using the word "intuitive". Intuition is relative, and someone who works with mathematics or statistics will develop a mathematical intuition about things. Just like if you're an experienced driver you'll intuitively know when other drivers are about to change lanes, even before they signal.

I think of L_2 more intuitive because it physical space uses the L_2 norm.

The other thing that makes L_1 counterintuitive is that if you measure the absolute deviation from the mean, then you aren't minimizing the deviation—in order to do that, you have to choose the median.

In other words, you say "this is the center" and "this is the measure of how far away everything is from the center", but you could have picked a different center which has a lower distance from your data. Counterintuitive.

ced · on Jan 15, 2014

Maybe I misunderstood what you meant, but I would say that the L_2 norm amounts to MAD. L2 means

   sqrt(x²+y²+...)

and in 1D, that becomes

   sqrt(x²) = |x|

klodolph · on Jan 16, 2014

In 1D, L_2 = L_1. But if you are talking about 100 sample points, the data has 100 dimensions.

ced · on Jan 16, 2014

Ahhh, I see what you mean, I think. Or maybe not. If I have 3 sample points, [1,1,1], then the standard deviation is 0, but where you take your Euclidean distance?

If I'm not mistaken, with 3 sample points a, b and c that have an average mu, then

  sigma = |(a, b, c) - (mu, mu, mu)|   (L2 norm in R3)

Is that the geometric interpretation you're referring to? It's neat, but the mu vector feels a bit artificial.