Maybe everything is "curve fitting." -- Note: I think it's more hierarchical than that but curve fitting is certainly one of the important capabilities of biological systems.
I don't think so. There's an incredibly important art and science to model selection that is not encapsulated in curve fitting. For example, say we observe a boy throwing a ball and we want to predict where the ball will land. From basic physics, we know the model is `y = 0.5 a t^2 + v0 t + y0` where `a` is the acceleration due to gravity, `v0` is the initial velocity, and `y0` is the initial height. After observing one or two thrown balls, even with error, we can estimate the parameters `a`, `v0`, and `y0` relatively well. Alternatively, we could apply a generic machine learning model to this problem. Eventually, it will work, but how much more data do we need? How many additional parameters do we need? Do the machine learning parameters have physical meaning like those in the original model? In this case, I contend the original model is superior.
Now, certainly, there are cases where we don't have a good or known model and machine learning is an extremely important tool for analyzing these cases. However, the process of making this determination and choosing what model to use is not solved by curve fitting or machine learning. This is a decision made by a person. Perhaps some day that will change, and that will be a major advance in intelligent systems, but we don't have that now and it's not clear to me how extending existing methods will lead us there.
Basically, I agree with the sentiment of the grandparent post. Machine learning is largely just curve fitting. How and when to apply a machine learning model vs another model is currently a decision left up to the user.
You're talking about the complexity of the model. If you take a purely input-output view of the world (which by the way, even classical Physics does), every problem _is_ curve fitting in a sufficiently high dimensional space. There is no _conceptual_ problem here. There is perhaps a complexity problem, but that's why I wrote that "I think it's more hierarchical than that."
I disagree. Many problem spaces are not continuous and can involve incomplete information that make a continuous model like a curve useless.
For instance, a linguistic model that lacks definitions for some words, or which allows too much ambiguity can leave sentences unparsable or uninterpretable. Disruptions to word order in sentences can lose sufficient information that no curve or fitment can recover it. A curve has to capture sufficient information for fitting it to be useful. I think not all concepts or relations are amenable to N-dimensional cartesian representation. (Though I'd like to see a reference confirming this.)
And hidden interdependence between dimensions can make any curve drawn in that coordinate space a misrepresentation of the actual info space, and any curve fit in it, dysfunctional.
Any mapping of info onto a cartesian coordinate space presumes constraints that limit the utility of any function that across that space. So no curve is guaranteed to be meaningful in "the real world" unless those assumptions are conserved upon reentry from the abstract world.
George Box's "All models are wrong, but some are useful" suggests that while fitting curves in wrong models may be possible, it well may be form without function.
>If you take a purely input-output view of the world (which by the way, even classical Physics does), every problem _is_ curve fitting in a sufficiently high dimensional space.
Not all spaces are Euclidean, and "purely input-output" still contains a lot of room for counterfactuals that ML models fail to capture.
Oh, I agree that neural networks are function approximators with respect to some geometry. When I say "counterfactuals", I'm talking about typical Bayes-net style counterfactuals, but as also used in cognitive psychology. We know that human minds evaluate counterfactual statements in order to test and infer causal structure. We thus know that neural networks are insufficient for "real" cognition.
You seem to have replied on a tangent: how is what you describe not just "curve fitting"?
Humans didn't magic that model up: you're ignoring the huge amount of human effort over thousands of years that it took to arrive at that model. If we gave a ML algorithm a similar amount of time and asked it to construct a simple model of the situation, it might very well hand back the formula you presented.
Your entire post basically begs the question: it supposes that humans are doing something that isn't "curve fitting", and then uses that to argue that they do more.
What, specifically, are you supposing can't be done by "curve fitting"?
I believe the process for deriving fundamental physical models differs from the techniques used in ML. For example, say we want to use the principle of least action to derive an expression for energy similar to what Landau and Lifshitz derive in their book Mechanics. Here, we assume that the motion of a particle is defined by its position and velocity. We assume that the motion of the particle is defined by an optimization principle. We assume Galilean invariance. We assume that space and time are homogeneous is isotropic. Then, putting this all together we can derive an expression for energy that `E=0.5 m v^2`. At this point, we can validate our model with a series of experiments that curve fits this expression to the results.
Alternatively, we could just run a bunch of experiments on data using ML models. Eventually, someone may have a wonderful idea and realize that we can just reduce the ML model into a parabola. Of course, this is due to intuition and not the ML model. Nevertheless, even though we end up at the same result, I contend the first result is different. It has a huge amount of information embedded into it about the assumptions we made into how the world works. When those assumptions are no longer satisfied, we have a rubric for constructing a fix. For example, if Galilean invariance no longer holds, we can fix the above model using the same sort of derivations to obtain relativistic expressions. Again, we could just throw more data at this new problem and fit an ML model to and perhaps someone would stare at this new model and realize that `E = m c^2`. However, I think that's discounting the embedded information in deriving these models and I don't think this information is present in ML models. ML models are generic. Our most powerful physical models are not.
Now, sure, once we have the models, we're just going to fit them to the data and it's all just curve fitting. Other fields call this parameter estimation, parameter identification, or a variety of other names. At that point it's all curve fitting. However, again, I contend the process for determining a new model is not.
Of course. "What do I fit this curve to" is a prerequisite to "what is the shape of this curve?"
You shouldn't feel the need to defend theory-based modeling against some imagined incursion from arrogant deep learning researchers. NNs work tremendously well in a few specific problem domains that we had no way to approach otherwise. Elsewhere, they're not much better than any other prediction algorithm. By the way XGBoost is curve-fitting, too.
I very much agree! Barring some kind of special intuition to the problem, I think ML are a fantastic tool for building models from empirical data. Even with intuition, sometimes they work as well. My core argument is that anthropomorphizing the algorithms has led to a great deal of confusion as to when we should or should not use these models. I often do computational modeling work with engineers and many of them are starting to eschew good, foundationaly sound models for ML not because they work better, in fact, on many of these problems they work far, far worse, but because good computational modeling is hard and it sounds like all they have to do with ML is teach the algorithms how physics works and how to be an engineer. Since they're good teachers, they should be able to teach the algorithm, right? In reality, it's still dirty, grinding computational modeling work. If we just called these models what they really are, empirical models, I think there'd be far less confusion as to when they should be used.
You haven't explained how the first case isn't "curve fitting": the agents performing the compilation of those facts into the new fact are just spitting out the "best" fit string of symbols based on learned rules, etc etc. Somethings computers can (theoretically) do, and which fits the description "curve fitting" just fine. School (and other education) is training the model they're using to do that compilation, but it's still just "curve fitting" based on reward/punishment signals.
What part of that can't an ML agent learn to do?
From my perspective, you're just describing the "higher order" layers of the network and pretending that humans aren't actually running those functions embedded on deep networks, then proclaiming that deep networks can't do it.
Alright, so from my perspective, curve fitting consists of three things
1. Definition of a model. ML models like multilayer perceptrons used a superposition of sigmoids, but newer models have superpositions of other functions and more nested hierarchies.
2. A metric to define misfit. Most of the time we use least squares because it's differentiable, but other metrics are possible.
3. An optimization algorithm to minimize misfit. Backpropogation is a combination of an unglobalized steepest descent combined with automatic differentiation like algorithm to obtain the derivatives. However, there is a small crowd that uses Newton methods.
Literally, this means curve fitting is something like the problem
Of course, there's also a huge number of assumptions in this. First, optimization requires a metric space since we typically want to make sure we're lower than all the points surrounding it. Though, this isn't all that helpful from an algorithmic point of view, so we really need an complete inner product space in order to derive out optimality conditions like the gradient of the objective being zero. Alright, fine, that means if we want to do what you say then we need to figure out how to compile these facts into a Hilbert space. Maybe that's possible and it raises some interesting questions. For example, Hilbert spaces have the property that `alpha x + y` also lie in the vector space. If `x` is an assumption like Galilean invariance and `y` is an assumption that time and space are isotropic, I'm not sure what the linear combination would be, but perhaps it's interesting. Hilbert spaces also require inner products to be well defined and I'm not sure what the inner product between these two assumptions are either. Of course, we don't technically need a Hilbert or Banach space to optimize. Certainly, we lose gradients and derivatives, but there may be something else we can do. Of course, that would involve creating an entire new field of computational optimization theory that's not dependent on derivatives and calculus, which would be amazing, but we don't currently have one.
From a philosophical point of view, there may be a reasonable argument that everything in life is mapping inputs to outputs. From a practical point of view, this is hard and the foundation upon which ML is cast is based on certain assumptions like the three components above, which have assumptions on the structures we can deal with. Until that changes, I continue to contend that, no, ML does not provide a mechanism for deriving new fundamental physical models.
Unless I'm missing something, and I likely am, the linked paper is still based on the the fundamental assumptions behind curve fitting that I listed above. Namely, their optimization algorithms, metrics, and models are still based on Hilbert spaces even though they've added stochastic elements and more sophisticated models.
I think you're reading way too far into my post. I was just pointing out that our amazing AI revolution is really just a new type of function approximation being that has magical-seeming results.
> Eventually, it will work, but how much more data do we need?
For a model that small, with so little variance (assume you measure correctly where the ball lands) it would be enough to do just a few throws to fit the parameters.