Hacker News new | past | comments | ask | show | jobs | submit login

It's a very insightful article about nitty-gritty details of working on ML problems. However as an outsider, I can't decide if some of very specific statements without any reasoning (e.g. good range of value for parameters) are highly valuable wisdom coming from years of experience, or merely overfitted patterns that he's adopted in his own realm.



I would say that table is really quite valuable. Kaggle problems come from all types of companies, so it doesn't make sense to say that it is "overfitted patterns that he's adopted in his own realm". With that said, validation on your own dataset will trump general knowledge, so you shouldn't view these parameters as hard and fast rules. But the parameters in that table will provide a useful starting point, and if you stray too far from them that is a warning sign that you might be overfitting.


Neither. The values he gives are reasonable (not bad choices) but also quite broad (covering several orders of magnitude usually - not valuable per-se).

Of course, things become much clearer when you actually know what the hyperparameters represent. For example, knowing the random forest algorithm, you'd know that having a higher number of estimators is always better (with the downside being diminishing returns and slower performance / more memory usage). Ironically, the blog post here gets this point wrong.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: