Hacker News new | past | comments | ask | show | jobs | submit login

> and the MAD is more intuitive, it has a natural geometrical interpretation

It's less intuitive, not more intuitive, at least for me. And the standard deviation definitely has a more geometric interpretation than MAD. If you measure a hundred samples, and you want to figure out how much they differ from the expected values, what could be more intuitive than Euclidean distance? But most people never bother to try and extend their intuition about ℝ^3 to ℝ^100 to realize how simple standard deviation truly is.

What is being advocated here is the use of the L_1 norm (MAD) over the familiar L_2 norm (standard deviation). Everybody knows and understands L_2, and L_2 has a lot of desirable properties.




This makes no sense. If you want to know how much they differ from the expected value, you first define what you mean by difference (i.e. L1 or L2 norm) and then measure it somehow. The standard deviation is an estimator for the square root of the expected value of this difference, when it's chosen to be the SQUARED L_1 distance (which is the same as the L_2 distance in 1-D). The MAD takes the mean of the L-1 distance.

While the standard deviation is proportional to the L_2 distance betwen the vector of the samples and a equal sized vector with all coordinates as the mean, that's not an intuitive expression based on the problem statement.


> that's not an intuitive expression based on the problem statement

That's the danger I'm talking about when you start using the word "intuitive". Intuition is relative, and someone who works with mathematics or statistics will develop a mathematical intuition about things. Just like if you're an experienced driver you'll intuitively know when other drivers are about to change lanes, even before they signal.

I think of L_2 more intuitive because it physical space uses the L_2 norm.

The other thing that makes L_1 counterintuitive is that if you measure the absolute deviation from the mean, then you aren't minimizing the deviation—in order to do that, you have to choose the median.

In other words, you say "this is the center" and "this is the measure of how far away everything is from the center", but you could have picked a different center which has a lower distance from your data. Counterintuitive.


Maybe I misunderstood what you meant, but I would say that the L_2 norm amounts to MAD. L2 means

   sqrt(x²+y²+...)
and in 1D, that becomes

   sqrt(x²) = |x|


In 1D, L_2 = L_1. But if you are talking about 100 sample points, the data has 100 dimensions.


Ahhh, I see what you mean, I think. Or maybe not. If I have 3 sample points, [1,1,1], then the standard deviation is 0, but where you take your Euclidean distance?

If I'm not mistaken, with 3 sample points a, b and c that have an average mu, then

  sigma = |(a, b, c) - (mu, mu, mu)|   (L2 norm in R3)
Is that the geometric interpretation you're referring to? It's neat, but the mu vector feels a bit artificial.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: