Hack the derivative

jacobolus · on Oct 3, 2015

Yay for functions which are analytic in some neighborhood. Here’s the original paper from the 60s about the idea in this post: http://www.math.fsu.edu/~okhanmoh/media/Lyness,%20Moler,%20S...

The latest work on this general topic is http://arxiv.org/pdf/1404.2463.pdf which manages to compute extremely accurate high-order derivatives (“...even the 100th derivative of an analytic function can be computed with near machine precision accuracy using standard floating point arithmetic”!!!), see also http://www.chebfun.org/examples/cheb/Turbo.html

* * *

Anyhow, for folks trying to take derivatives of (or do various other operations to) arbitrary continuous functions, I recommend checking out Chebfun – http://www.chebfun.org – a Matlab library created by a group of applied mathematicians at Oxford.

Instead of just looking at a couple points near some point of interest, Chebfun approximates the continuous function over some interval to near machine precision by a high-degree polynomial, and then operates on that polynomial.

Operations like numerical differentiation are much more accurate when you make the right “global” (i.e. not just at one single point) approximation.

Also check out Nick Trefethen’s book Approximation Theory and Approximation Practice, the first 6 chapters of which are available online: http://www.chebfun.org/ATAP/atap-first6chapters.pdf

gjm11 · on Oct 3, 2015

The paper by Lyness and Moler is nice, but if it's really "about the idea in this post" then the fact is somewhat hidden. It's mostly about using the Cauchy integral formula to turn derivatives into integrals, using the Poisson summation formula to relate those to finite sums, and using the Moebius inversion formula to get those relations into a form from which you can extract the derivatives.

Maybe taking a very short partial sum of one of the series in Theorem 2 gives you the result in the blog post here, but that seems rather sledgehammer-to-crack-a-nut-ish when (as the blog post says) all you need is the Cauchy-Riemann equations.

I think it's true that the Lyness-Moler idea of "numerical differentiation by numerical complex integration" was, historically, part of the chain of ideas that led to the very simple "complex-step approximation" discussed in the blog post. But that's a far cry from saying that their paper is "about the idea in this post"!

There's some discussion of the history here: http://blogs.mathworks.com/cleve/2013/10/14/complex-step-dif... where Moler (as in "Lyness and Moler") says "The complex step algorithm that I'm about to describe is much simpler than anything described in that 1967 paper. So, although we have some claim of precedence for the idea of using complex arithmetic on real functions, we certainly cannot claim to have invented today's algorithm."

And yes, Chebfun is very nice.

jacobolus · on Oct 3, 2015

Fair enough. :-)

jamessb · on Oct 3, 2015

The article comments: "In a lot of respects, it’s quite amazing how accurate many calculations can be made with floating numbers, like orbital mechanics and heat equations"

I'm not sure it is so surprising. In Nick Trefethen's Numerical Analysis entry in The Princeton Companion to Mathematics, he notes:

"Thus, on a computer, the interval [1 , 2] , for example, is approximated by about 10^16 numbers. It is interesting to compare the fineness of this discretization with that of the discretizations of physics. In a handful of solid or liquid or a balloonful of gas, the number of atoms or molecules in a line from one point to another is on the order of 10^8 (the cube root of Avogadro’s number). Such a system behaves enough like a continuum to justify our definitions of physical quantities such as density, pressure, stress, strain, and temperature. Computer arithmetic, however, is more than a million times finer than this. Another comparison with physics concerns the precision to which fundamental constants are known, such as (roughly) 4 digits for the gravitational constant G, 7 digits for Planck’s constant h and the elementary charge e, and 12 digits for the ratio μ_e/μ_B of the magnetic moment of the electron to the Bohr magneton. At present, almost nothing in physics is known to more than 12 or 13 digits of accuracy. Thus IEEE numbers are orders of magnitude more precise than any number in science. (Of course, purely mathematical quantities like π are another matter.)" http://people.maths.ox.ac.uk/trefethen/NAessay.pdf

eriktaubeneck · on Oct 3, 2015

That's an interesting point I had not considered. My view (mostly from the math side, my shallow dive into physical systems was just from a PDE class in grad school) is from the view of unstable differential equations where very small changes to initial conditions can cause massive changes to the output of the system. My uninformed instincts would tell me that even with orders of more digits than atoms in a physical system, changes smaller than the available accuracy to initial conditions could have significant impact on the eventual state of the system.

replicant · on Oct 3, 2015

Another point of view on why it is not striking that we can model heat conduction well. The heat equation is well-posed, which among other things, means that the solution depends continuously of the input data. And we observe this in many physical systems, that small changes in the cause produces small changes in the effect, therefore, we should expect from good mathematical models of reality to display this property.

Now, orbital mechanics do display unstable behaviour. I don't dare to adventure on how people work around this. https://en.wikipedia.org/wiki/Well-posed_problem

jjoonathan · on Oct 3, 2015

I don't see any mention of automatic differentiation which I thought was the dominant technique in this area. Is there an advantage to the method described in OP (other than mathematical cuteness)?

lloda · on Oct 3, 2015

No. Automatic differentiation is superior to the complex step method in every way.

The only reason to use the complex step method is if you're using legacy languages where it's difficult to implement dual numbers but you have good support for complex numbers. I don't think anybody should be using the complex step method in new applications.

Some reading ~ http://aero-comlab.stanford.edu/Papers/martins.aiaa.01-0921....

jordigh · on Oct 3, 2015

wtf

Automatic differentiation only works for the simplest functions for which you already know what the Taylor series looks like. For those cases, you might as well just hardcode derivative functions and the basic derivative rules (linearity and chain rule). It is not a general-purpose method.

For functions that you can't even express by a simple formula, you still have to rely on finite differencing.

Don't call "legacy" anything that you don't understand.

lloda · on Oct 3, 2015

The question was about automatic differentiation vs the method in the article (the complex step method) which is essentially a defective variant of automatic differentiation with a spurious numerical parameter.

Automatic differentiation works for any composition of those ‘simplest functions’ you mention, which is quite a lot of stuff, including whole programs.

Approximation methods have their place, sure. Sometimes they're good enough and sometimes it's all you can do. What does that have to do with anything?

jjoonathan · on Oct 3, 2015

Thanks for the link, btw. I'm in the process of digesting the paper.

jordigh · on Oct 3, 2015

I was objecting to the suggestion that the complex finite differencing scheme was worthless ("legacy") and automatic differentiation is all we should ever do from now on.

jjoonathan · on Oct 3, 2015

Can you comment on the space of functions which you know to be complex differentiable but for which you can't analytically evaluate/approximate the real derivative?

If so, I look forward to learning about it. If not, you were throwing stones in a glass house.

jordigh · on Oct 3, 2015

Sure, a simple example is the derivative of Riemann zeta for Re s < 1. Granted, usually you work with the logarithmic derivative which is a lot more tractable, but if for whatever reason you need the actual derivative, you're gonna have a lot of trouble with automatic differentiation, as the typical analytic continuation formulas involve some complicated improper integrals.

There are also functions that you only know by sampling (e.g. ocean temperatures) for which you assume smoothness. You need to pick an interpolation method, but sometimes you do not interpolate beyond the sampling points, because that's just making up numbers. When you're limited by your original sampling step size, you have little recurse but to compute derivatives by some finite differencing scheme.

jacobolus · on Oct 3, 2015

He did link to a book of papers largely about automatic differentiation, http://amzn.com/0898713854 (table of contents here http://bt.pa.msu.edu/cgi-bin/display.pl?name=sfbook)

jjoonathan · on Oct 3, 2015

I already know that a book on numerical analysis is likely to contain the answer I seek. The point of asking was to avoid a trip to the library since the working memory of someone who has been investigating differentiation techniques also likely contains the answer I seek.

jacobolus · on Oct 3, 2015

I don’t quite understand what your question is, so I’m going to assume you were trying to ask, “When would you not want to use automatic differentiation?” I think the main time you can’t get much out of automatic differentiation is when your function is a black box or approximation, e.g. it comes from measuring physical data, or is the output of some kind of simulation, etc. I’m not an expert though.

You can get a lot of hits if you google related keywords (e.g. a query with both terms "numerical differentiation" "automatic differentiation" in quotation marks). But here’s a start, http://alexey.radul.name/ideas/2013/introduction-to-automati...

You can probably get useful advice if you ask on http://scicomp.stackexchange.com or similar.

jjoonathan · on Oct 3, 2015

Someone who knows to bring automatic differentiation into a discussion about complex-step differentiation with the implication that they have already compared them and found one wanting (while implying that they may have missed something and are receptive to contrary opinions) already knows how CSD and AD work. This person also likely know how to use google, the library / amazon, and stack exchange (at the very least) but upon running a few simple queries has determined that the approximate effort to find an answer exceeds their interest in the subject. Unless someone else can use their own greater familiarity with the subject to provide an opinion or a pointer to a specific easily accessible resource (example: lloda's pdf link, counterexample: your book link) the line of inquiry will fizzle out due to the tyranny of cost/benefit analysis and having better things to do.

I appreciate that you were trying to be helpful, which is why I'm trying to be gentle (and explanatory) as I tell you that you weren't.

srean · on Oct 3, 2015

Had the following thought just to entertain myself, I don't expect it to have any utility in practice, but here goes:

Take a small interval around your smooth function that includes the point where you want to compute the derivative. Joint the right and left ends by just reflecting itself, so that you have a differentiable periodic function. Now sample that at a suitable rate, take its FFT, derive the FFT of the derivative by multiplying it with 2\pi I. Take the inverse FFT and evaluate it at the point of interest. May be useful if you want to compute the derivative at many points, but seems wasteful otherwise.

One of course has to dot the i's and cross the t's (of aliasing effects, Nyquist limits etc). Perhaps some other fast integral transform would work even better. I will be surprised if there isn't an old industry around it.

cabinpark · on Oct 3, 2015

This is called a spectral method and is widely used in numerical codes.

I highly recommend Spectral Methods in MATLAB by Trefethen (who someone mentions above) for a very good tutorial. You can freely ignore the MATLAB part and use whatever programming language you want, as long as you have an FFT routine.

srean · on Oct 3, 2015

Thanks for pointing to the right direction in case I want scratch that itch more. I am not surprised though that this has been done, expected the same. From a quick look, no one seems to suggest reflecting the original analytic function to get a periodic function, I guess some caveats lurk there.

Does other integral transforms work better than Fourier ?

cabinpark · on Oct 3, 2015

When you use this method you are implicitly making the function periodic. I can give you any function on some interval (sufficiently well behaved) and you can compute the Fourier series of it. Even though it's only defined on the interval, if you plotted the Fourier series you would still find it to be periodic. The same idea carries over to the spectral derivative. Even if the function isn't periodic, the method still works, it's just that the numerics near the discontinuity are going to be crap and can affect things further away. But if you are sufficiently far away, things are kosher. Note there are ways of dealing with this using Chebyshev interpolation (see http://math.mit.edu/~stevenj/fft-deriv.pdf section 6) which Trefethen's book also discusses.

And no FFTs are the only transformations I am aware of. I've never heard of anyone using other transformations in a general numerical context, outside of specialized problems.

srean · on Oct 3, 2015

Given the local nature of the problem and the potential for using O(n) operations (as opposed n log n) my first suspect was wavelet transforms and sure enough people have done that but that body of work seems more recent than the use of FFT.

kmill · on Oct 3, 2015

I was trying to figure out how this works algebraically, and I think I understand for polynomials (or power series) at least. Basically, for a really small number a, ai has the property that (ai)^2 is effectively zero, so one can think of this as working with the dual numbers themselves (i.e., throwing out the x^2 term and onward in a power series). The focus on analytic functions is then because they are representable by power series at each point. I think this is what lloda is referring to.

edoloughlin · on Oct 3, 2015

Lost me a bit at

  Im(f(x+ih))/h

near the end (that's a fancy I that android FF won't paste). Can anyone explain where 'm' came from?

Or is 'Im' just a fn returning the imaginary part of its argument?

EDIT: reading the code following, it's clear that Im is just that.

mbroshi · on Oct 3, 2015

I was also lost at that part, although for a different reason. The function f was assumed to be from R to R, so it does not make formal sense to plug in f(x+ih).

I guess the author has the unstated assumption that f is the restriction to R of a function on C.

eriktaubeneck · on Oct 3, 2015

Great point, and I apologize for the confusion. The function should be C to C, with the added restriction of R to R.

eriktaubeneck · on Oct 3, 2015

Thanks, I just realized that when I introduced the notation above, I just used I(f), but then when I used it below I wrote Im(f). I will clear that up.

czzarr · on Oct 3, 2015

Im is for imaginary part of its argument as you say

cheez · on Oct 3, 2015

This was so much fun to read.

eriktaubeneck · on Oct 3, 2015

Thanks! Glad you enjoyed it.

eccstartup · on Oct 3, 2015

I DO think Python libraries can do this already.

jamessb · on Oct 3, 2015

The scipy.misc.derivative() function uses a central difference formula, rather than exploiting the fact that a function is analytic: https://docs.scipy.org/doc/scipy/reference/generated/scipy.m...

Applying the Cauchy–Riemann equations to essentially take finite differences in the imaginary direction is an interesting trick I hadn't seen before, so I appreciated an article introducing it.

Edit: A separate python package, Numdifftools, does seem to support complex-step differentiation https://pypi.python.org/pypi/Numdifftools

eriktaubeneck · on Oct 3, 2015

Did not intend to make any claims that Python libraries don't do this, but rather to explore the subject as a way to better understand the implementation and limits of floating point numbers and numerical analysis based on them.