Hacker News new | past | comments | ask | show | jobs | submit login
Median and Mad Revisited with an Online Estimator (anthonylloyd.github.io)
19 points by AnthonyLloyd on April 9, 2020 | hide | past | favorite | 9 comments




The link says Median Absolute Deviation.

Here's a SE discussion with people using the term inconsistently without even noticing. https://datascience.stackexchange.com/questions/42760/mad-vs...

It's absurd to use ambiguous acronyms like this.


Happy to answer questions if anything is not clear or discuss.


Would you happen the memory complexity of this algorithm? I maintain an online machine learning library written in Python, called creme, where we implement online statistics. We have a generic onine algorithm for estimating quantiles, and so a specific algorithm for estimating medians would be welcome. I'm always on the lookout for such online algorithms.


It's a fixed memory size but gets released when you move into the recursive part. Increasing the fixed part will give you a better starting estimate.

See here for quantiles: https://stackoverflow.com/questions/1058813/on-line-iterator...

The P^2 algorithm used in creme is interesting. For the median it would give a 2 sided median deviation. Maybe this could be changed slightly to be symmetric and give Median and MAD. I'll look into.


Can you link to a proof of the equation starting at "The standard error of the sample Median approximates to"?


It's a quite hand wavy central limit theory for large N or close to normal distribution:

http://davidmlane.com/hyperstat/A106993.html


What is MAD?


Median Absolute Deviation

Discussed in the previous post: http://anthonylloyd.github.io/blog/2016/10/21/MAD-Outliers

It's roughly the range around the median where half the sample is within (since you can do 2 sided MAD also).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: