OP here -- I'm new at this, but I don't think so. A zero output from a neuron st...

xg15 · 2025-03-29T15:34:27 1743262467

Hey, thanks for the replay and sorry for the belated answer.

That's a good example. Read up backpropagation on wikipedia again, and I think you're right there and I had some misunderstandings.

Eq. 4 of [1] says:

> However, if [neuron] j is in an arbitrary inner layer of the network, finding the derivative of [loss for one specific target value] E with respect to [output of j] o_j is less obvious. [...]

Considering E as a function with the inputs being all neurons L = { u , v , … , w } receiving input from neuron j, [...] and taking the total derivative with respect to o_j, a recursive expression for the derivative is obtained:

[derivative of E with respect to o_j] = sum ℓ in L ( [derivative of E with respect to o_ℓ] [derivative of o_ℓ with respect to net_ℓ] * [weight of neuron ℓ for o_j])

Therefore, the derivative with respect to o_j can be calculated if all the derivatives with respect to the outputs o_ℓ of the next layer – the ones closer to the output neuron – are known.*

So if a neuron is disabled through dropout, this would affect all neurons in the layer "before" it (i.e. closer to the input layer).

I think you could also argue that a dropped out neuron has its set L being artificially set to empty, so the sum in the formula would reduce to zero. But that would indeed be something different than setting the weight to zero.

[1] https://en.wikipedia.org/wiki/Backpropagation#Derivation