I guess the unspoken assumption Babbage makes here is «if I put *only* the wrong...

nyrikki · 2024-10-26T17:59:44 1729965584

For ML it goes deeper, but unfortunately discussions about it devolve into an approximation of the Brouwer–Hilbert controversy.

If you think about it from the VC dimensionality lens, in respect to learnability and set shattering is simply a choice function it can help.

Most of us have serious cognitive dissonance with dropping the principal of the excluded middle, as Aristotle and Plato's assumptions are baked into our minds.

You can look at why ZFC asserts that some sets are inconstructable, or through how Type or Category theory differ from classic logic.

But the difference between RE and coRE using left and right in place of true and false seems to work for many.

While we can build on that choice function, significantly improving our abilities to approximate or numerical stability, the limits of that original trinity of laws of thought are still underlying.

The union of RE and coRE is the recursive set, and is where not p implys p and not not p implys p holds.

There is a reason constructivist logic, lambda calculus, and category theory are effectively the same thing.

But for most people it is a challenging path to figure out why.

As single layer perceptrons depend on linearly separable sets, and multilayer perceptrons are not convex, I personally think the constructivist path is the best way to understand the intrinsic limits despite the very real challenges with moving to a mindset that doesn't assume PEM and AC.

There are actually stronger forms of choice in that path, but they simply cannot be assumed.

More trivial examples, even with perfect training data.

An LLM will never be able to tell you unknowable unknowns like 'will it rain tomorrow' or underspecified questions like 'should I driven on the left side of the road'

But it also won't be able to reliably shatter sets for problems that aren't in R with next token prediction, especially with problems that aren't in RE, as even coRE requires 'for any' universal quantification on the right side.

A LLM model will never be total, so the above question applies but isn't sufficient to capture the problem.

While we can arbitrarily assign tokens to natural numbers, that is not unique and is a forgetful functor, which is why compression is considered equivalent to the set shattering I used above for learnability.

The above questions framing with just addition and with an assumption of finite precision is why there is a disconnect for some people.