On the limitations of machine learning as in the OP, the OP is correct. So, righ...

srean · on July 20, 2017

There is a whole lot of difference between curve fitting and curve fitting with performance guarantees on future data under a distribution free (limited dependence model).

BTW the 'machine learning' term is Russian coinage and its genesis lies in non-paramteric statistics, the key result that sparked it all off was Vapnik and Chervonenkis's result that is essentially a much generalized and non-asymptotic version of Glivenko Cantelli. The other result was that of Stone that showed universal algorithms that can achieve the Bayes error in the limit not only exist but also constructed such an algorithm. This was the first time it was established that 'learning' is possible.

nshm · on July 18, 2017

This is much stricter and well-thought approach than OP makes, there is no need to consider deep learning alone without generalization to all possible math models. For example, OP could mention that simple x^2 function could not be well approximated with a deep network with relu layers with small number of nodes but it could be trivially approximated with a single x^2 layer.

However, the question is, how complex are the "true" models of nature. Gravity law is simple with single equation and one parameter but what if human language law has millions of parameters and not really manageable by human. 500 samples would not be enough then. This is a classical Norvig vs Chomsky argument. Still, for many things the simple laws might exist.

meh2frdf · on July 17, 2017