Hacker News new | past | comments | ask | show | jobs | submit login
Notes on Taylor and Maclaurin Series (thegreenplace.net)
91 points by ibobev 9 months ago | hide | past | favorite | 32 comments



For some functions you can turn the Taylor series into a recurrence relation, which makes it blazing fast to calculate.

For instance for the family of functions f(x0,x1,...) = exp(poly(x0,x1,...)), where poly is a multivariate polynomial of order m, you can compute the Taylor coefficients with a recurrence relation of order m (that needs to look back m steps). This shows up in quantum optics, for example.


All interests I had in Taylor/Macluarin series were promptly beat out of me when having to manually calculate hundreds of them in Calculus courses :).

The Remez algorithm linked at the end still inspires curiosity though. Any other "numerically more useful" approximation algorithms folks want to highlight? The Padé approximant looks like another interesting candidate to read about.


Yes! The workhorse numerical technique is Chebyshev approximation. Remez exchange usually starts with it, for fine-tuning with respect to a "maximum error" norm, but it also works quite well by itself, and can be computed efficiently (even with a high degree) even from a function you can only evaluate numerically.

A really good place to read up on it is the documentation for Chebfun.

https://www.chebfun.org/docs/guide/guide04.html

Also: be on the lookout for a blog post on using Chebyshev polynomials to efficiently compute error metrics for curves.


Chebyshev’s polynomials seem to have eclipsed his semi-iterative method for solving linear systems, which is too bad IMO.


Why is that bad? Are there any cases where Chebyshev iteration is the best option? It seems like acquiring the necessary information about the spectrum would be prohibitive in practice, although I have never tried it out.


I’ve only really played with it on a single-node machine, which isn’t where it ought to shine.

IMO it sits at a really interesting spot as a sort of “more robust” (hand-waves) iterative solver that doesn’t require inner products. You need to know something about the spectrum sure, but sneakily figuring out things about the spectrum is somewhere where people can show off their expertise I think.


That seems plausible.

I actually spent a little time digging into this, and I'm not sure if this method is actually due to Chebyshev! This link has the most extensive references I found:

    https://encyclopediaofmath.org/wiki/Chebyshev_iteration_method
and from what I can tell it's actually due to Richardson.


I had the opposite reaction.

Math department: “oh, look, we are very clever, see these pretty plots the differential equations generate and all the beautiful closed form solution we can make”

Me: “Hmm, very nice, I am not smart enough for this.”

Engineering department: “Complicated equation bad, smash with Taylor series, little part go bye-bye, big part is smooth.”

Me: “Yes this equation will match my brain nicely.”


Weird. When I was doing boxing, having to run sprints every training session, doing dozens of pushups, situps and what not did not beat the interest I had in boxing :).


Not at all weird, because you were also boxing. But if your boxing sessions were only to be sprints, pushups, situps etc, and no boxing, you would surely lose interest in doing boxing at that place. Likewise, if your jiu jitsu training were only drills and no sparring, there would be no people doing jiu jitsu after a while.


My boxing training sessions were about 50-50 exercises and the padding/sparring session (in which there were big breaks).

In my undergrad, I was made to take 6 math courses, and about 18 physics courses which used that math [1]. Plus additional Engineering/CS electives. So over a 25-75 ratio of exercise to "real" stuff.

Much better ratio than boxing.

[1] If you decide that the math courses have nothing real/interesting in them. Which is not true.


The big difference is probably that I wasn't taking the math classes out of interest in being great at computations. Just like manually doing long division of the factorial denominator wouldn't have made me any better able to apply Taylor series, finding the 3rd derivative of a random trig function manually (well, for some reason a trig reference table was allowed but not a CAS) dozens of times didn't make me any better able to use them either but did wear out my energy for the topic anyways.

Of course this "interested in applying concepts to solve the problem, not calculating the values in said solution" lean is probably why I was in a computer program instead of a math program in the first place :).


Let me state my point explicitly: You are probably referring to real math as one of (1) building abstraction, (2) doing proofs of theorems, (3) solving applied math problems.

To build competence at these problems, you need to exercise the basic skills, of which computational skills are a big part. If you don't exercise your computational skills, you can't become good at mathematics in the sense of (1), (2), or (3). The same way you can't get to be good at boxing without doing a lot of pushups.

Now, not all teachers are good. Many certainly try to force students to do too much computational practice, compared to their level. But both as a student and later as a prof, I discovered that there is value is doing really dumb and complicated calculations by hand sometimes during your education. The reason is that many times when people your computational algorithm fails, you have to do the dirty thing by hand to figure out what went wrong. And depending on what you do, computing the 3rd derivative of a random trig function manually correctly with nuances is exactly what you need to do [1].

Well, not everyone. Many/most people do jobs that only require low levels of maths, and they might never need to do any of that. But a math course taught to everyone has to take a cookie cutter approach and that usually requires setting a high enough bar, so people who have to do the hard math at their job, have those skills. This is an unfortunate fact of resource optimization.

[1] I am an industry researcher now, and have to do all kinds of these crappy calculations.


I didn't refer to "real math" but, regardless, I was rejecting your explicit point. Not all forms of practice are inherently useful to you even if it's in the same category/topic. E.g. just because you trained running thousands of miles for power marathons does not mean you are necessarily better positioned to box. It does probably mean you've put a bit of a damper on your interest in doing more exercise for boxing though. More directly, just because you calculated hundreds of Taylor Series manually does not inherently mean the practice you did made you better at something you'll be able to use with the topic you've now spent dozens of hours on already.

What I mean by randomly calculate in this case is be given a random trig equation, be told to hand calculate an nth level Taylor series, compute the numerical answer to a few values, and move on to the next problem until you've done hundreds in the course. I do NOT mean running through dozens practice problems to get better at learning when/how to apply Taylor series (though that did come in a later course, it thankfully didn't need to be done by hand just as nobody was expecting you to calculate sqrt(5.7) by hand to learn calculus either).

This no doubt makes me a great calculator of 5th derivatives of trig functions but, except for the first few perhaps, at the expense of dozens of hours of being great at actually applying tricks to do things with what the Taylor series spits out instead. Like you say, the math course has to give cookie cutter training sometimes not about the direct concept at hand but that means you can be shit out of luck if that cookie cutter training wasn't relevant for boxing even though the topic itself can be.


One of the things that has always seemed rather magical to me is the Taylor series of the exponential function for very large negative values. We know that a number like e^-100 is a number extremely close to zero. Yet when you write out the Taylor series you see an alternating some of increasingly large, seemingly arbitrary numbers (at least for ~100 terms):

e^-100 = 1 - 100 + 5000 - 166,666.6 + 4,166,666.6 - ...

If you were just given this sum and knew nothing about Taylor series or the exponential function you'd assume that its value was some extremely large number. Yet everything manages to cancel out just right so that the resulting sum is almost exactly zero, but not quite.

I can't help but wonder if there's some parallel to parts of quantum field theory. If you expand out the interactions between two particles you also end up with a series that is apparently divergent. Yet we know experimentally that the first few terms work quite well as an approximation. It feels a bit like you're looking at the series of e^-100 without knowing about the exponential function.


The whole polynomial functions that give the truncations of the exponential Taylor series are also well-behaved in a way that's really surprising if you haven't thought about it before. For example, if you plot 1, 1 + x + x^2/2, 1 + x + x^2/2 + x^3/6 + x^4/24, and so on, you'll see that they actually don't oscillate at all as you might expect for negative inputs (for the reason you're saying), but rather they're very nice convex "bowl-shaped" positive functions, which hug zero more and more closely along the negative axis as you include more terms. One way to get some intuition about this is to notice that the derivatives of the next polynomial in the sequence are just the ones that came before (related to exp(x) being its own derivative). In particular, if all the even degree ones so far have been convex, then so will the next one be.

The idea you mention at the end is not specific to something mathematically advanced like quantum field theory. Useful truncations of divergent series show up in many more elementary situations in calculus and applied math; here is a nice class taught by the excellent Steven Strogatz where he talks about a simple example in the first lecture:

https://www.youtube.com/watch?v=KZsk8B_z8pI&list=PL5EH0ZJ7V0...


I have been watching this class (to supplement a class on dynamically systems where we are learning perturbative methods). Stroganoff has a good text book, a good pop sci book, a really good podcast and now really good technical lectures. One small positive result of Covid.


Why do you think a series which absolutely converges but whose first few terms are a poor approximation is related to a series which diverges but whose first few terms match the empirically expected value?

Not only is it a stretch to wonder about parallels between a basic Calc II concept and Quantum Field Theory, but is seems like the exponential function is the exact opposite of the example you provided.


I was being a bit loose in my language, but I was more specifically thinking of the connection to "resurgence theory" which aims at understanding more complicated kinds of apparently divergent infinite sums. See this article for a popular overview: https://www.quantamagazine.org/alien-calculus-could-save-par...


In complex analysis, all differentiable (in a region) functions are infinitely differentiable, and are the same as their Taylor series.

Real analysis is a zoo of weird exceptions. Including 1/e^(-1/x^2) away from 0, 0 at 0. Its Maclaurin series is just 0, which is clearly not the function we wrote down.

I can't explain why real analysis fit my brain and complex analysis doesn't. But to me complex analysis looks like, "We draw a path, then calculate this contour integral, and magic happens."


Complex analysis is far too nice. There are so many theorems where my first thought was "there's no way that's true".

Picard's great theorem is totally insane. As is even its little brother, and even Liouville's theorem.

The proofs aren't even that long. They just feel totally false.


Right. It's magic.

But once you have the machine...

-----

Fundamental Theorem of Algebra: Every nonconstant polynomial over the complex numbers has a complex root.

Proof: Suppose that p(z) is a polynomial over the complex numbers with no root.

Consider the function 1/p(z). If p(z) had no roots, then 1/p(z) is entire. But we can bound it for everything outside of a large circle because the leading term dominates the others. And since the large circle is compact, we can bound p(z) away from 0 inside the circle. Between the two, 1/p(z) is bounded, and so much be constant by Liouville's theorem.

But 1/p(z) is only constant if p(z) is constant. Therefore any complex polynomial with no complex roots must be constant.

-----

I'm convinced by the proof. But part of me still says that it is magic.


And the definite integrals you can get out of the Cauchy integral formula are awesome.

I think it is because being holomorphic is so much stronger even than C^\infty (continuous with all derivatives continuous) real much less just continuous or merely integrable.


Instead of thinking of real analysis as a zoo of weird exceptions, it's probably more accurate to think of complex analyic functions as the exceptions. For example, when viewed as a two dimensional mapping from the plane to itself, complex-analytic functions are conformal (angle-preserving) whereas most differentiable mappings from the plane to itself are not.


On the other hand, complex analysis explains things that aren’t obvious in real analysis, like why the radius of convergence for a continuous function on the real line might be 1 (turns out there are singularities off the real line on the complex plane).


Something that seems to be frequently lacking in discussions of convergence in introductory texts on Taylor series is the possibility that the series DOES converge, but NOT to the approximated function. It's not sufficient to conclude that the derived Taylor series must converge to cos(x) because it always converges, since any of the infinitely many functions that match cosine's derivatives at x = 0 will have the same Taylor expansion. How do you know cos(x) is the one it will converge to?


I'm not quite sure whether you're asking for an explanation or whether you're simply making the observation that this is often omitted from the discussion. Either way I think it's an interesting point, so I'll elaborate a bit on it.

As you say, there's no guarantee that even a convergent Taylor series[0] converges to the correct value in any interval around the point of expansion. Though the series is of course trivially[1] convergent at the point of expansion itself, since only the constant term doesn't vanish.

The typical example is f(x) = exp(-1/x²) for x ≠ 0; f(0) = 0. The derivatives are mildly annoying to compute, but they must look like f⁽ⁿ⁾(x) = exp(-1/x²)pₙ(1/x) for some polynomials pₙ. Since exponential growth dominates all polynomial growth, it must be the case that f(0) = f'(0) = f"(0) = ··· = 0. In other words, the Taylor series is 0 everywhere, but clearly f(x) ≠ 0 for x ≠ 0. So the series converges only at x = 0. At all other points it predicts the wrong value for f.

The straight-forward real-analytic approach to resolve this issue of goes through the full formulation of Taylor's theorem with an explicit remainder term[2]:

f(x) = Σⁿf⁽ᵏ⁾(a)(x-a)ᵏ/k! + Rₙ(x),

where Rₙ is the remainder term. To clarify, this is a _truncated_ Taylor expansion containing terms k=0,...,n.

There are several explicit expressions for the remainder term, but one that's useful is

Rₙ(x) = f⁽ⁿ⁺¹⁾(ξ)(x-a)ⁿ⁺¹/(n+1)!,

where ξ is not (a priori) fully known but guaranteed to exist in [min(a,x), max(a,x)]. (I.e the closed interval between a and x.)

Let's consider f(x) = cos(x) as an easy example. All derivatives look like ±sin(x) or ±cos(x). This lets us conclude that |f⁽ⁿ⁺¹⁾(ξ)| ≤ 1 for all ξ∈(-∞, ∞). So |Rₙ(x)| ≤ (x-a)ⁿ⁺¹/(n+1)! for all n. Since factorial growth dominates exponential growth, it follows that |Rₙ(x)| → 0 as n → ∞ regardless of which value of a we choose. In other words, we've proved that f(x) - Σⁿf⁽ᵏ⁾(a)(x-a)ᵏ/k! = Rₙ(x) → 0 as n → ∞ for all choices of a. So this is a proof that the value of the Taylor series around any point is in fact cos(x).

Similar proofs for sin(x), exp(x), etc are not much more difficult, and it's not hard to turn this into more general arguments for "good" cases. Trying to use the same machinery on the known counterexample exp(-1/x²) is obviously hopeless as we already know the Taylor series converges to the wrong value here, but it can be illustrative to try (it is an exercise in frustration).

A nicer, more intuitive setting for analysis of power series is complex analysis, which provides an easier and more general theory for when a function equals its Taylor series. This nicer setting is probably the reason the topic is mostly glossed over in introductory calculus/real analysis courses. However, it doesn't necessarily give detailed insight into real-analytic oddities like exp(-1/x²) [3].

[0]: For reference, the Taylor series of a function f around a is: Σf⁽ᵏ⁾(a)(x-a)ᵏ/k!. (I use lack of upper index to indicate an infinite series as opposed to a sum with finitely many terms.)

[1]: At x = a, the Taylor series expansion is f(a) = Σⁿf⁽ᵏ⁾(a)(a-a)ᵏ/k! = f(a) + f'(a)·0 + f"(a)·0² + ··· = f(a). All the terms containing (x-a) vanish.

[2]: https://en.wikipedia.org/wiki/Taylor%27s_theorem#Taylor's_th...

[3]: Something very funky goes on with this function as x → 0 in the complex plane, but this is "masked" in the real case. In the complex case, this function is said to have an essential singularity at x = 0.


One of the most useful things I learned in mathematics.

Gauss-Newton algorithm, GNSS (GPS) solution solving, financial calculus (Ito calculus), and more.


A coworker, once, had a cool idea to use Taylor Series to encode histograms for metrics collection. Basically a digest method akin to sketches or t-digests. We wound up using t-digests as Stripe (iirc) had a good OSS implementation at the time, but using Taylor Series has been lingering in my mind ever since.


Are you referring to generating functions?


Nitpick:

> Let's start with our Maclaurin series for cos(x):

> p(x) = 1 - x^2/2! + x^4/4! - x^6/6! + x^8/8! - ... = 1 + \sum_{n=1}^∞ ((-1)^n x^(2n)) / (2n)!

> Ignoring the constant term, we'll write out the ratio limit. [...]

No need to ignore the constant term: it fits the formula just fine! Evaluating ((-1)^n x^(2n)) / (2n)! at n=0 gives ((-1)^0 x^0) / 0! = (1 * 1) / 1 = 1, which is precisely the constant term.

Each of those three 1s is because the empty product, i.e. the product of an empty list of numbers, is 1. This is sensible because it means that (product of L1) * (product of L2) = product of (L1 append L2).


Filed under “shit I wish I had read 20 years ago”.

This is very much in the spirit of 3Blue1Brown. Very nice work.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: