>In particular, I'm thinking of Dual Process Theory Which has been at least part...

CountSessine · on July 17, 2017

Partly, yes - especially with ego depletion on the ropes. I'm not sure that dual process theory needs to be thrown out along with ego depletion, though.

eli_gottlieb · on July 17, 2017

I can see three reasons to "throw it out":

1) Replication failure, plain and simple.

2) Overfitting. There are dozens to hundreds of "cognitive biases" on lists: https://en.wikipedia.org/wiki/List_of_cognitive_biases. When you have hundreds of individual points, you really ought to draw some principles, and the principle should not be, "The system generating all this is rigid and inflexible."

3) Imprecision! Again, dozens to hundreds of cognitive biases. What possible behavior or cognitive performance can't be assimilated into the heuristics and biases theory? What can falsify it overall, even after so many of its individual supporting experiments and predictions have fallen down?

It looks like a mere taxonomy of observations, not a substantive theory.

CountSessine · on July 17, 2017

1) Replication failure, plain and simple.

How many meta-analyses have been conducted as of 2017 showing one result or the other? I don't think ego depletion itself has been thoroughly "debunked" yet. If it is a real effect, it's probably quite small - but I don't think that ego depletion has been thrown in the bin just yet.

2) Overfitting. There are dozens to hundreds of "cognitive biases" on lists: https://en.wikipedia.org/wiki/List_of_cognitive_biases. When you have hundreds of individual points, you really ought to draw some principles, and the principle should not be, "The system generating all this is rigid and inflexible."

3) Imprecision! Again, dozens to hundreds of cognitive biases. What possible behavior or cognitive performance can't be assimilated into the heuristics and biases theory? What can falsify it overall, even after so many of its individual supporting experiments and predictions have fallen down?

Wait a second - has anyone ever tried to explain the "IKEA Effect" using Dual Process Theory? What does a laundry-list of supposed cognitive biases have to do with the theory? Is anyone really trying to explain/predict all this almanac-of-cognitive-failings with Dual Process?

eli_gottlieb · on July 17, 2017

>Is anyone really trying to explain/predict all this almanac-of-cognitive-failings with Dual Process?

To my understanding, yes. That's basically what Dual Process theories exist for: to separate the brain into heuristic/bias processing as one process, and computationally expensive model-based cause-and-effect reasoning as another process. Various known cognitive processes or results are then sort of classified on one side of the line or another.

When you apply Dual Process paradigms to specific corners of cognition, they can be useful. For example, I've seen papers purporting to show that measured uncertainty allows model-free and model-based reinforcement learning algorithms to trade off decision-making "authority". This is less elegant than an explicitly precision-measuring free-energy counterpart, but it's still a viable hypothesis about how the brain can implement a form of bounded rationality when bounded in both sample data and compute power.

But when you scale Dual Processes up to a whole-brain theory, it's just too good at describing anything that involves dichotomizing into a "fast-and-frugal" form of processing and another expensive, reconstructive form of processing. One of the big issues here is that besides the potentially false original evidence for Dual Processes, we don't necessarily have reason to believe there exists any dichotomy, rather than a more continuous tradeoff between frugal heuristic processing and difficult reconstructive processing. The precision-weighting model-selection theory actually makes much more sense here.

CountSessine · on July 17, 2017

This is a fantastic answer - thank you, Eli. So what do you think of the original article?

eli_gottlieb · on July 17, 2017

>This is a fantastic answer - thank you, Eli.

Thanks! I've been doing a lot of amateur reading in cog-sci and theoretical neurosci. The subject enthuses me enough that I'm applying to PhD programs in it this upcoming season.

>So what do you think of the original article?

Thorough and accurate. I'll give a little expansion via my own thought. One thing taught in every theoretically-focused ML class is the No Free Lunch Theorem. In colloquial terms it says, "If you don't make some simplifying assumptions about the function you're trying to learn (and the distribution noising your data), you can't reliably learn."

I think experts learn this, appreciate it as a point of theory, and then often forget to really bring it back up and rethink it where it's applicable. All statistical learning takes place subject to assumptions of "niceness". Which assumptions, though?

Seems to me like:

* If you make certain "niceness" assumptions about the functions in your hypothesis space, but few to none about the distribution, you're a Machine Learner.

* If you make niceness assumptions about your distribution, but don't quite care about the generating function itself, you're an Applied Statistician.

* If you make niceness assumptions about your data, that it was generated from some family of distributions on which you can make inferences, you're a fully frequentist or Bayesian statistician.

* If you want to make almost no assumptions about the generating process yielding the data, but still want just enough assumptions to make reasoning possible, you may be working in the vicinity of any of cognitive science, neuroscience, or artificial intelligence.

The key thing you always have to remind yourself is: you are making assumptions. The question is: which ones? The original article reminds us of a whole lot of the assumptions behind current deep learning:

* The "layers" we care about are compositions of a continuous nonlinear function with a linear transform.

* The functions we care about are compositions of "layers".

* The transforms we care about are probably convolutions or just linear-and-rectified, or just linear-and-sigmoid.

* Composing layers enables gradient information to "fan out" from the loss function to wider and wider places in the early layers.

* The data spaces we care about are usually Euclidean.

These are things every expert knows, but which most people only question when it's time to look at the limitations of current methods. The author of the original article appears well-versed in everything, and I'm really excited to see what they've got for the next part.