As co-founder of Resolver Systems -- we tried but ultimately failed to take on Excel with a Python-enabled equivalent back in 2007 -- and current employee at Anaconda (providing Python in Excel) I really do hope you get this one to work. Excel is a mess, Python is better, and someone surely will eventually be able to fix the former with the latter. Let's hope it's you :-)
The amount of information transmitted from one generation to the next is potentially much more than the contents of DNA. DNA is not an encoding of every detail of a living body, it is a set of instructions for a living body to create an approximate copy of itself. You can't use DNA, as far as we know, to create a new organism from scratch to create a new organism without having the parent organism around to build it.
We do know for certain that many parts of a cell divide separately from the nucleus and have no relation to the DNA of the cell - most well known being the mitochondria, which have their own DNA, but also many organelles just split off and migrate to the new cell quasi-independently. And this is just the simplest layer in some of the simplest organisms - we have no idea whatsoever how much other information is transmitted from the parent organism to the child in ways other than DNA.
In particular in mammals, we have no idea how actively the mother's body helps shape the child. Of course, there's no direct neuron to neuron contact, but that doesn't mean that the mother's body can't contribute to aspects of even the fetal brain development in other ways.
Interesting. As you say, that certainly makes sense for mammala. But I'd be interested in knowing what mechanisms you might conjecture for birds, where pretty much all foetal development happens inside the egg, separated from the mother -- or fish, or octopuses.
It does depend on where you enter the country, at least in my experience (UK citizen, have visited a lot both on visa waivers and prior to that on visas, since the 80s).
Every time I've flown into Austin, TX, they've been super-nice. DC likewise. NYC/Newark are brusque but not nasty. San Francisco are scary. Boston on the one time I flew there was just horrendous, though that might have just been one agent who was having a bad day.
My intuition is very undeveloped on this, but it makes some kind of sense to me that dropout would make convergence slower, because you're ignoring a bunch of parameters in every batch. The goal seems to be to get a better, more general model by trading off some training time.
OP here -- I'm new at this, but I don't think so. A zero output from a neuron still contributes to the output. Taking a silly toy case, imagine a network with one input, one neuron, one output. You're training to it match data where whatever the input is, it outputs 1 -- that is, your target state would have the weight set to zero and the bias set to 1. If it was initialised with the weight zero but the bias also zero, then when you pushed your test set through you'd get zero outputs, but the error would be non-zero and there would be adjustments to propagate back.
Hey, thanks for the replay and sorry for the belated answer.
That's a good example. Read up backpropagation on wikipedia again, and I think you're right there and I had some misunderstandings.
Eq. 4 of [1] says:
> However, if [neuron] j is in an arbitrary inner layer of the network, finding the derivative of [loss for one specific target value] E with respect to [output of j] o_j is less obvious. [...]
Considering E as a function with the inputs being all neurons L = { u , v , … , w } receiving input from neuron j, [...] and taking the total derivative with respect to o_j, a recursive expression for the derivative is obtained:
[derivative of E with respect to o_j] = sum ℓ in L ( [derivative of E with respect to o_ℓ] [derivative of o_ℓ with respect to net_ℓ] * [weight of neuron ℓ for o_j])
Therefore, the derivative with respect to o_j can be calculated if all the derivatives with respect to the outputs o_ℓ of the next layer – the ones closer to the output neuron – are known.*
So if a neuron is disabled through dropout, this would affect all neurons in the layer "before" it (i.e. closer to the input layer).
I think you could also argue that a dropped out neuron has its set L being artificially set to empty, so the sum in the formula would reduce to zero. But that would indeed be something different than setting the weight to zero.
I've found it very useful with organisational problems too. I had a complex issue to work through recently and tried working with ChatGPT, Claude and Grok 3 to optimise letters to some of the people involved to try to get things solved in the way I felt was fairest for all involved (anonymised, of course, with no memory on ChatGPT). One neat trick was to export the letter from one chat, then to start a fresh one, say that I am <role of recipient> and provide the letter and ask what it thinks -- basically red-teaming it.
The process of doing that clarified my thoughts and arguments so much that I never wound up having to send them -- I'd already made the right points on Slack and in meetings, and a compromise was achieved.
I have 100% set myself a "no side quests" rule while going through the book so that I don't do that. I've had... patchy success with that, but I think I'm doing pretty well apart from the week I spent getting LaTeX rendering working on my blog so that I could do the pretty maths in that post.
What I'm doing is building up a list of things to dig into in depth once I've finished the book. Kind of like a treat to encourage me to push forward when I'm working through a bit that's tough to understand.
reply