AD still explodes for “interesting” derivatives: efficiently computing the adjoi...

jedbrown · on April 9, 2020

Correction: Finding the optimal algorithm (minimal number of operations) for computing a Jacobian is NP-complete, but evaluating it in a multiple of the cost of a forward evaluation is standard.

Also, many optimizers that are popular in ML only need gradients (in which case the Jacobian is just the gradient vector). Second order methods are important in applications with ill-conditioning (such as bundle adjustment or large-scale GPR), but they have lots of exploitable structure/sparsity. The situation is not nearly as dire as you suggest.

thechao · on April 9, 2020

Yep; I was imprecise!

hinkley · on April 9, 2020

Around 2000 I was accidentally inventing a Bloom filter variant (to this day I don’t know how I missed the Google papers at the time) for doing a large set intersection test between two machines.

Somehow, I ended up with a calculus equation for determining the right number of bits per entry and rounds to do to winnow the lists, for any given pair of machines where machine A found n entries and machine B found m. But I couldn’t solve it. Then I discovered that even though I did poorly at calculus, I still remembered more than anyone else on the team, and then couldn’t find help from any other engineer in the building either.

Eventually I located a QA person who used to TA calculus. She informed me that my equation probably could not be solved by hand. I gave it another day or so and then gave up. If I couldn’t do it by hand I wasn’t going to be able to write a heuristic for it anyway.

For years, this would be the longest period in my programming career where I didn’t touch a computer. I just sat with pen and paper pounding away at it and getting nowhere. And that’s also the last time I knowingly touched calculus at work.

(although you might argue some of my data vis discussions amount to determining whether we show the either the sum or rate of change of a trend line to explain it better. The S curve that shows up so often in project progress charts is just the integral of a normal distribution, after all)

moultano · on April 9, 2020

The Jacobian is used all the time, but where do you end up needing the adjoint?

nbulka · on April 11, 2020

What's a link to that paper?