OP here -- I'm new at this, but I don't think so. A zero output from a neuron still contributes to the output. Taking a silly toy case, imagine a network with one input, one neuron, one output. You're training to it match data where whatever the input is, it outputs 1 -- that is, your target state would have the weight set to zero and the bias set to 1. If it was initialised with the weight zero but the bias also zero, then when you pushed your test set through you'd get zero outputs, but the error would be non-zero and there would be adjustments to propagate back.
Hey, thanks for the replay and sorry for the belated answer.
That's a good example. Read up backpropagation on wikipedia again, and I think you're right there and I had some misunderstandings.
Eq. 4 of [1] says:
> However, if [neuron] j is in an arbitrary inner layer of the network, finding the derivative of [loss for one specific target value] E with respect to [output of j] o_j is less obvious. [...]
Considering E as a function with the inputs being all neurons L = { u , v , … , w } receiving input from neuron j, [...] and taking the total derivative with respect to o_j, a recursive expression for the derivative is obtained:
[derivative of E with respect to o_j] = sum ℓ in L ( [derivative of E with respect to o_ℓ] [derivative of o_ℓ with respect to net_ℓ] * [weight of neuron ℓ for o_j])
Therefore, the derivative with respect to o_j can be calculated if all the derivatives with respect to the outputs o_ℓ of the next layer – the ones closer to the output neuron – are known.*
So if a neuron is disabled through dropout, this would affect all neurons in the layer "before" it (i.e. closer to the input layer).
I think you could also argue that a dropped out neuron has its set L being artificially set to empty, so the sum in the formula would reduce to zero. But that would indeed be something different than setting the weight to zero.
I could well be misunderstanding you, though!