Hacker News new | past | comments | ask | show | jobs | submit login

Alright, so from my perspective, curve fitting consists of three things

1. Definition of a model. ML models like multilayer perceptrons used a superposition of sigmoids, but newer models have superpositions of other functions and more nested hierarchies.

2. A metric to define misfit. Most of the time we use least squares because it's differentiable, but other metrics are possible.

3. An optimization algorithm to minimize misfit. Backpropogation is a combination of an unglobalized steepest descent combined with automatic differentiation like algorithm to obtain the derivatives. However, there is a small crowd that uses Newton methods.

Literally, this means curve fitting is something like the problem

min_{params) 0.5 sum_i || model(params,input_i) - output_i ||^2

Of course, there's also a huge number of assumptions in this. First, optimization requires a metric space since we typically want to make sure we're lower than all the points surrounding it. Though, this isn't all that helpful from an algorithmic point of view, so we really need an complete inner product space in order to derive out optimality conditions like the gradient of the objective being zero. Alright, fine, that means if we want to do what you say then we need to figure out how to compile these facts into a Hilbert space. Maybe that's possible and it raises some interesting questions. For example, Hilbert spaces have the property that `alpha x + y` also lie in the vector space. If `x` is an assumption like Galilean invariance and `y` is an assumption that time and space are isotropic, I'm not sure what the linear combination would be, but perhaps it's interesting. Hilbert spaces also require inner products to be well defined and I'm not sure what the inner product between these two assumptions are either. Of course, we don't technically need a Hilbert or Banach space to optimize. Certainly, we lose gradients and derivatives, but there may be something else we can do. Of course, that would involve creating an entire new field of computational optimization theory that's not dependent on derivatives and calculus, which would be amazing, but we don't currently have one.

From a philosophical point of view, there may be a reasonable argument that everything in life is mapping inputs to outputs. From a practical point of view, this is hard and the foundation upon which ML is cast is based on certain assumptions like the three components above, which have assumptions on the structures we can deal with. Until that changes, I continue to contend that, no, ML does not provide a mechanism for deriving new fundamental physical models.




What do you think about a bayesian interpretation of the above as MAP/MLE?

https://arxiv.org/abs/1706.00473


Unless I'm missing something, and I likely am, the linked paper is still based on the the fundamental assumptions behind curve fitting that I listed above. Namely, their optimization algorithms, metrics, and models are still based on Hilbert spaces even though they've added stochastic elements and more sophisticated models.


Interesting abstract. I love Bayesian stats so hopefully this will be a fun commute read. Thanks!


I think you're reading way too far into my post. I was just pointing out that our amazing AI revolution is really just a new type of function approximation being that has magical-seeming results.


I can't think of a succinct way to describe my response, but I'm not sure we disagree, so much as we're talking about slightly different things.

Regardless, I wanted to thank you for the detailed replies -- having a back and forth helped me ponder my thoughts on the matter.

Have a good one. (:


Thanks for chatting!




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: