Hacker News new | past | comments | ask | show | jobs | submit login

Another question, how are standard errors calculated? I assume they're not from the bootstrapping since the p-values clearly aren't from the standard errors ( +/- 1.96*se is crossing coef=0 for several cases but with small p-values). The other way I would think to get p-values would be the percentage of bootstrap replicates that have (coef==0). But for only 20 replicates you're stuck with p=0 or p=0.05.

I'm genuinely curious how to do coef significance testing for L1-regularized models. I once saw someone ask this at a Tibshirani talk and he said "oh we have no idea, we've resorted to the bootstrap before".




to be honest we just recorded the coeff values for each replicate and did the bootstrap variance calculation.

% of replicates with (coef==0) is potentially much more clever, especially since that's the test we want to perform anyway. i'll run that over the data and see what changes.


I think the question is these don't look like NormalCDF(coef/se) p-values given the coef and se you report. They tend to be too small.

From a frequentist perspective, counting zeroes don't make much sense because under the null of coef=0 there is still a chance you don't estimate coef=0, even after regularization.


    I think the question is these don't look like NormalCDF(coef/se) p-values given the coef and se you report.  They tend to be too small.
right that's my question


interesting yeah some of them definitely don't look right. the output is from scipy's stats.ttest_1samp




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: