I enjoyed reading your article, but I want to note that the implementation is incorrect. The biggest error is that you are not propagating uncertainty forward in time, and therefore are underestimating the error. This shows up in your plots---the error bounds should cover the true state most of the time, but they do not in your results.
A couple more nits to pick:
"In essence, each of the 1000 passengers are doing thus: take the last known position (at the time before the present), add the velocity, and also, knowing that the wind and the water waves are going to slightly alter the course, add some random estimated fluctuations to it." The passengers don't add noise to their estimate. For the passengers, the expected value of the state at time k+1 given the state at time k will just be `position_k+1 = position_k + velocity * Delta_t`. The /true/ dynamics include noise, which is accounted for in the filter by adding to the estimated covariance. Your code doesn't break because you're taking a bunch of samples of the dynamics (by having 1000 passengers) and numerically taking the variance of the results. This is very different than what is usually done in practice.
It's a common misconception that GPS is affected by weather. It is not.
You are using a very non-standard definition of consistency. In estimation theory, having an estimator be consistent means that as you get more and more data, your estimator will converge to the true value.
Thank you for writing up the article and simplifying for a general audience, but there are a few misunderstandings here that seem to be causing problems. I am a graduate student in estimation theory and I would be happy to talk more about this if it would be helpful.
Kalman Filters track both an estimate x_hat and the variance of that estimate, P. There is also some true state, x_true. The definition of the variance of the estimate is E[(x_true - x_hat)^2], where E[.] is the expected value operator.
Using the example from the post, there is a known dynamics function that propagates the state one time step forward. That function is f(x, w) = x + velocity * Delta_t + w, where w is zero-mean Gaussian noise (meaning that E[w] = 0) with some known variance sigma_w^2 (meaning that E[w^2] = sigma_w^2). To update the true state, we do x_true_k+1 = f(x_true_k, w) = x_true_k + velocity * Delta_t + w.
So what should the filter's estimate of the state at time t_k+1 be? Well, if our estimator has been working up to this point, then x_hat_k = E[x_true_k] (this is just saying that our estimate is the "best guess" of the true state). We also want it to be true that x_hat_k+1 = E[x_true_k+1]. Plugging in the true dynamics from earlier, we get that x_hat_k+1 = E[x_true_k + velocity * Delta_t + w] = E[x_true_k] + E[velocity * Delta_t] + E[w] = x_hat_k + velocity * Delta_t + 0. Note that we are not adding noise to the estimate. The filter has no way of knowing the noise that enters the system, the filter will just be correct /on average/ for the noise that comes up.
It is also important to update the uncertainty in the estimate. State estimates are of little use without knowledge of the uncertainty. If the estimate is being fed into a self-driving car or something, "The car is here!" is pretty useless on its own. "The car is here, plus or minus 10 cm" means the car can drive as normal, but "The car is here, I think, or maybe in one of the surrounding states" means there's a problem.
We already know that the definition of the variance is P_k=E[(x_true_k - x_hat_k)^2]. Let's say that at time t_k, P_k = sigma^2. To get the variance at time t_k+1, we can just apply the definition. P_k+1 = E[(x_true_k+1 - x_hat_k+1)^2] = E[((x_true_k + velocity * Delta_t + w) - (x_hat_k + velocity * Delta_t))^2] = <skipping some steps> = E[(x_true_k - x_hat_k)^2] + E[w^2] = P_k + sigma_w^2 = sigma^2 + sigma_w^2. So the variance at the new time is equal to the old covariance sigma^2, plus some uncertainty accounting for the noise added in the dynamics sigma_w^2.
Sorry about the poor math formatting, but I hope that answers your question!
Wow... you left me wondering why signal processing teachers in college don't teach Kalman Filters with this simplicity. I know mathematical concepts are best taught mathematically, but that does led to information loss for those who don't have the background requisites.
I used to teach Discrete Cosine Transform and Wavelet Transform through images alone and would always find this teaching method of "intuition before rigor" work better than the other way around.
> why signal processing teachers in college don't teach Kalman Filters with this simplicity
> those who don't have the background requisites
Any college student studying signal processing should have the background prerequisites.
That said, it is easy to forget fundamentals. I have a couple theories for why professors don't use the intuition-before-rigor approach.
1) The professors themselves do not have great intuition but rather deep expertise in manipulating numbers and equations. Unlikely theory, but possible.
2) Professors do not generally get rewarded for their teaching prowess. And breaking mathematical concepts down to an intuitive level requires quite a lot of work and time better spent writing grant proposals or bugging your doctoral students. Cynical theory but probably true in many cases.
3) Once you understand the math, it is so much easier than the intuitive approach that the lazy human brain will not allow you to go back to "the hard way". I like to think this is the primary driver of skipping an intuitive teaching approach.
I would say #3 applies much more broadly than mathematical education. It is the difference between expertise and pedagogy. That is to say being an expert in a thing is a completely different skill than teaching others to become competent at the thing. Say you want to improve your golf game with better drives - should you learn from the guy at the range who hits the ball the farthest? Probably not. You should figure out the guy who has improved his drive the most. Eg, the guy who started out driving 100 yards and now consistently hits 300 is better to learn from than the guy who is hitting 350. (Credit Tim Ferriss for driving this concept into my head).
Academic here. In my experience, many experts in any given area have massive amounts of intuition about their own area. But keep in mind that we do have to end up teaching outside the area. So it's safer to teach technicalities rather than a (most probably incorrect) intuition.
Some other issues:
- in many subjects, it is dangerous to work with the wrong intuition - they not only do not give a clear picture, they actually give you the false picture.
- even though math gets a bad rep, I think the only reason we are able to work with high-dimensional matrices and vectors is because we let go of geometric imagery at some point. Most people, including myself, have a hard time visualizing even 4 dimensions.
Sometimes I start with a geometric idea in 2D and see if it generalizes.
Other times I think of a 2D caricature that encapsulates some important operation but doesn’t necessarily relate to the actual idea. It's more like a heuristic visual than anything. A simple example might be if I’m thinking about projecting onto a linear subspace. The subspace is some convex blob in the plane in my mind’s eye, even though that’s not a linear subspace, it’s easier for me to conjure and the important part, the projection, remains intact. If the linear part was also important then I’d probably make it a line but I can’t think of a time I’ve done that.
Some concepts only made sense to me when, as you said, I stopped trying to visualize them. The most notable one to me was quaternions, because those are 4D, but even complex numbers only made sense when I stopped looking for physical intuition and realized they’re just a description of certain operations for points in the plane.
Now, I can suffer through a lot of abstract nonsense, but the worst classes I took were heavy on the abstract nonsense and light on answers to “why do we need all this machinery at all?” If I couldn’t imagine an application I cared about, I had a hard time figuring out what were the important concepts and how to fit them together.
When I was a couple years into my actual math education (about midway through vector calculus) I had a crazy moment when I was really struggling to understand some problem that required more manipulation than the 3 dimensional problems I'd been used to up until that point. My roommate was a postdoc and he told me the simplest, dumbest thing, which was, roughly "the math is a separate thing than the picture. You have to actually think about the math"
That one sentence blew my mind. Until then I had relied almost entirely on the geometric representation of a concept or a problem, and never knew to reach for the the actual concept.
I think about 75% of people check out of math completely super early; of the 25% that remain, 75% of those never quite "get" that math is a separate thing from graphs and charts. The rest are usually successful scientists, engineers, and similarly technical folk. (I won't venture my opinion on the difference between mere number manipulators vs. actual prov-ers of mathematics, but I think it's probably a similar ratio)
Same here, but not with calculus and geometry. My problem was with statistics: I had always been able to solve high-school level problems by intuition alone, and never needed the theory. That is, until 3rd-year university, when the problems became too complex to grasp by intuition alone. I had a really hard time that year, because I basically had to re-learn 4 years of foundational theory.
So I'd say that teaching intuition first is a dangerous path: students may fail to fully understand the theory if they can get by on intuition. And intuition is not a solid foundation to build on.
3/ Show how maths prevent the corner case of the intuition
When I study maths alone (even after a course), it's more like:
1/ Understand the technicalities of maths (very tedious because many teacher "leaves the details out"), without bothering much about the subject.
2/ After that huge effort, I try to play with the math to get an intuition of what it does. An intuition == a way to explain what it does in layman's terms, with drawings, physical/real examples, etc.
3/ Revisit the maths, and especially their corner cases (what happens if this denominator goes to zero, what if I cannot invert that matrix ?) and confront that to my intuition. The maths are the safest way to make sure you get your intuition right.
>3) Once you understand the math, it is so much easier than the intuitive approach that the lazy human brain will not allow you to go back to "the hard way". I like to think this is the primary driver of skipping an intuitive teaching approach.
I found that developing a intuitive understanding was required for me to remember something. I could then remember the Math. The Math is nice to give a mathematical understanding but I found the intuitive knowledge much more important.
I majored in mathematics in college and was frequently irritated by the "intuition before rigor" approach, so I have a personal point of view on this.
First, intuition is an unreliable indicator. While solving a problem, or searching for something new, you are guided by intuition, but it isn't infallible. Sometimes you get the feeling that something will work, get very excited, and then you try to make it rigorous and it falls apart. There is a whole category of cranks who think they have created a perpetual motion machine or a theory of quantum gravity and don't know enough math to figure out that they haven't, so they live perpetually in that initial feeling of excitement. So, intuition is exciting, but you develop the instinct of not letting intuition get too far ahead of rigor without checking in. And vice-versa, if you go too far with calculations and no longer have a feeling for where it's going, you start to feel like you should stop. Lectures have to respect this yoking of the two together. If you just give me a bunch of calculations, my brain will revolt because the calculations aren't motivated. Likewise, if you give me a bunch of intuition without rigor, my brain will revolt because the intuition isn't grounded.
Second, intuition develops through practice. If you want me to have an intuitive ability to use a mathematical idea, don't hide it from me! Like, if you want to teach a kid to play baseball, there's a lot of talking involved, but maybe hand them a ball first and let them hold it in their hands while you talk, feel the size and texture of it. Even better, just let them try to throw it, and work up to playing catch. The motivating, big-picture idea of baseball includes things like the thrill of winning, camaraderie, suspense, and post-game snow cones, but their eyes are going to glaze over if you start like that. You get a little kid hooked by putting a ball in their hands and letting them throw it. Some mathematical ideas are fun to build from the ground up, and you can discover the snow cones and the cheer of the crowd later.
The third factor is irrational, but deeply felt for some people. Even though we know and experience every day that rigor and intuition exist equally and symbiotically, each rather pointless without the other, it is expected to give pride of place to intuition, as a way of catering to people who didn't enjoy math in school or are insecure about their abilities. "The really important thing is the idea, the intuition... calculation closes the mind and dulls imagination... the real geniuses have creative breakthroughs, and we are never short of less imaginative people to clean up the unimportant details afterwards." That's how, as someone studying STEM, you learn to talk to people outside of STEM, and it's chronically irritating and demoralizing that you have to denigrate an important aspect of what you do. So you tend to be hypersensitive to that chauvinism creeping into a "safe space" like a class in a technical subject. Of course each STEM subject has its own approach to mathematical rigor (mathematicians and physicists notoriously don't see eye-to-eye) but we all appreciate the value of it.
Also I think your third point is important. If you only get the math, you can use it, and the intuition will come. If you only have the intuition, you're in a much harder position to get going and develop the math.
Totally agreed. One of the favorite professors in college would introduce a theorem, and instead of jumping into proving it, would first show us useful applications of the theorem, and then prove it afterwards.
I can _only_ learn top-down. Facts without context are much less interesting and much more difficult to internalize. The incentive to understand, inspired by said context, is a prerequisite to mustering the motivation needed to to truly learn with sustained focus without losing interest. My ADHD brain will stop cooperating otherwise, and I’ll find myself daydreaming against my will!
This is exactly what I did as a grad student when I taught classes for my prof. She preferred bottom up and I preferred top down. I wanted to give people a reason for learning what they were getting ready to learn rather than "let's start from first principles" that almost all my professors wanted to do.
I prefer the exact same approach for learning to play board games too. Your 20 minutes of rules explanation will go completely over my head unless you start with "the goal/win condition of the game is to get more points/reach the end faster/collect all the macguffins".
Aw god, Yes, Yes, Yes! I understand applications not abstractions so show me it's value and I'll understand it immediately, or give me a massive motivation to understand it if I don't. If they'd taken that approach at my schools and university I'd have been so much better off.
That paper contains an interesting reference to an exact generalization of the KF to a flat initial condition. I hadn’t seen that before and solves numerical issues that come from the common practice of just picking a large initial covariance. In practice, I haven’t seen any notable numerical artifacts from doing that, but it’s quite an appealing fix!
Complementary to this, if you are looking for a thorough, mathematical introduction to Kalman Filters and family, I will highly recommend this book[1]
- Written by a software engineer who had to implement kalman filters for his job. How he motivates and conveys concepts may resonate with this audience.
- Written as interactive Jupyter notebooks. You can clone the repository and run through the lessons.
- Incremental improvements. Starting from a simple filter, incorporating Bayes' rule, and extending to probability distributions. It provides a gentle on-ramp to Kalman filters.
There is one aspect missing. When taking the weighted average of a prediction and measurement, the weights can be time varying in a Kalman filter. Otherwise I believe it goes by a different name.
A good example is a single sensor measuring a slowly changing quantity. A fuel Guage for example. A good estimate from second to second is that there is no change, but a measurement may have noise (fuel sloshing around in the tank). A Kalman filter in this case will look like a first order low-pass filter with an exponentially decaying gain. The cutoff frequency changes so you can find the starting level quickly (in seconds) but then ignore the noise with a very low frequency cutoff (0.01hz say).
My understanding is that linear Kalman filters are an optimal solution to linear problems. Linear Kalman filters are relatively easy to understand and implement.
Most applications I've found are non-linear. Extended (and unscented) Kalman filters are much tougher to grasp and implement. Resources and libraries are fewer and less useful.
For example, I have an AHRS/GNSS CAN device intended for small UAVs. Extended Kalman filters for these I've come across (like for PX4 and Ardupilot) are very complex, and have many parameters. I've found it more straightforward to take an ab-initio approach using quaternion fundamentals, and nudging the gyro solution towards the accelerometer "up", and the magnetometer inclination vector. If the accelerometer magnitude differs much from 1G or the mag vector differs much from the local earth mag field strength, downweight or skip the update from that sensor, letting the gyro coast.
Likely, an EKF would be the correct play, but I've given up trying to understand one, or make or adapt one that would work or be easier to tune and diagnose than the ab-initio approach.
For what it’s worth, the EKF isn’t much more complicated than the regular KF. You are basically just linearizing your dynamics / sensor model then doing the KF updates with that linear model.
However, for quadrotors specifically, there is a huge pain point: the rotations. A nuance of the linear models of the Kalman filter is that everything lives in Euclidean space. However, rotations place you on a manifold — for quaternions specifically this manifold is the set of unit quaternions. If you naively apply an EKF to estimate your quaternion, you will no longer have a unit quaternion and then your estimates will be garbage. There are well-known ways to handle this manifold constraint but they result in some of the ugliest equations I have ever turned into code.
As a simple example of the difficulty, consider a state (x, y) that, due to physics or whatever, we know will always be on the unit circle, I.e., if f(x, y) is your dynamics, it outputs a new point on the circle. However, if you linearize f(x, y), these approximate dynamics are not guaranteed to keep you on the unit sphere and you end up with a non-physical state (or state estimate for the extended Kalman filter).
You just threw me back into time I spent on EKF implementations on early iPads over a decade ago (as an exchange student), as part of a M.Sc. in Control Theory... before I had learnt about KF. The goal was to compensate for the drift observed on the position of those devices which made them useless for the need (art exhibition stuff).
I don't think I even fully completed the project because my understanding of both quaternions and KF were at that point pretty shaky, this could be a fun project to pull with some old hardware now..
A probabilistic programming language can make it much easier to implement a non-linear filter. Essentially, you implement the generative model, and you get inference compiled for you. Interesting options to look at include ForneyLab.jl, Infer.net, Gen.jl, and Pyro.
This can work if the time scale of inference is smaller than the time scale of your measurements. I’m skeptical that this would work on a quadrotor that requires a fast control loop. However, model predictive control (determining actions given a state and dynamic model; the Kalman filter is model predictive estimation) first found major use in chemical plants because they could crunch the numbers on a big computer that didn’t need to fit on a robot and the real time between each time step was large. For such a situation, you might be able to get MCMC to work.
The options I suggested above are not necessarily MCMC, they are mostly message passing algorithms. ForneyLab and Gen in particular are designed for online inference.
Good point! Still, a big reason why the reason why the KF is a staple is that it’s really really fast. When I’ve used tools like the ones that you mentioned, it was normally not for inference not in a feedback loop. I’m less familiar with message passing methods than Monte Carlo methods but I’m going to look into them now
Do you have a good reference? Preferably one that mentions a control application because I’m curious to see what the model assumptions and speed look like in that context
You can start with a review [1], and then BRML [2]. I think applications that are very focused on control tend to come from the military and thus not so well documented.
I'm nearsighted, and I have astigmatism. I've noticed that when I close one eye and look at something like a clock on the wall, each eye produces a differently distorted image. Each eye is a bit different. But when I look at the clock with both eyes, the image is much sharper, and better than either eye alone. The article's scenario of 1,000 ship passengers reporting their individual GPS coordinates reminded me of this phenomenon.
I bet that brains use clever algorithms like Kalman filters extensively.
If you're interested in the term, what you noticed with your eyesight is an example of hyperacuity. Coincidentally, I learned about it making the same observation with my own eyes. :)
Do I understand it correctly that Kalman filters are supposed to give a better estimate of a value from noisy observations than just averaging the observations?
Say I measured something 3 times and observed 7,8 and 9. My guess would be that the actual value is 8. Would a Kalman filter come up with a different estimate?
> Do I understand it correctly that Kalman filters are supposed to give a better estimate of a value from noisy observations than just averaging the observations?
Yes, but with a big caveat. Your observations are related by something like the transition mechanism. You can't just apply a Kalman filter to every problem and expect to get better results.
A Kalman filter is traditionally used in estimating something that moves over time (but can be used for more than just this). Think a person moving in a video, or some other sort of random walk. By assuming some sort of relationship between two successive timepoints/measurements, like speed and current heading, we can blend the information from a model for motion with the model from noisy measurements to get a better estimate of the position/value or of the entire motion history.
If that motion model is meaningfully incorrect, then you don't get improved estimates.
A lot of the ways that people have extended them have to do with incorporating more sophisticated motion models, like dealing with wheel slippage in robotics, so your heading direction might have error now, etc.
Yes, because unlike an simple average you have a model for both what you are trying to measure and the measurement.
For example, one basic model for your example could be if you are trying to measure a constant with some initial uncertainty, say it's Gaussian with a standard deviation, and noisy measurements with say also an uncertainty with Gaussian distribution and some standard deviation. You can tune the initial uncertainty (a number) around the constant you are trying to estimate , and the uncertainty in the measurements (another number than may or may not change).
In this example a Kalman filter won't behave as an average. If the measurements are good (low uncertainty) they will converge quickly, if they are bad the estimate will jump around and take longer to converge.
Anyway, I made a mess, I'm not good at explaining...
And by the way, it's not true what people are suggesting here that Kalman filters are for moving things. They are used to estimate constants _all the time_ but yes, they're more popular for moving things.
In addition to what others (and I) said elsewhere, where the Kalman filter beats the simple average is when your samples are not identically and independently distributed. If they were, then yes, you can just take an average as justified by the law of large numbers.
The Kalman filter handles the case where your samples are correlated due to some (linear) dynamics (also your measurements don’t need to be directly of the quantity of interest, they can be a (linear) function of that quantity (plus Gaussian noise). Thus, the probability of you observing 8 for your second measurement changes given the knowledge that you observed a 7 as the first measurement. If you just take the sample average like you did above, that will not in general converge to the actual mean value.
KF gives you the optimal estimate (in the sense of maximum likelihood) for a (multivariate) Gaussian model.
What KF would give you in your example depends on what exactly the model is. But even in the simplest scenario imaginable you'd have to specify what expected value you start with, in the same way that you would in Bayes.
It would if you had some past history to work with. Let’s say that you measured the same object awhile ago and you want to incorporate that prior information in a principled way, while also letting the new measurements dominate if they give a very different result.
If you encode that language into a linear Gaussian model, you get KF.
The way I understand it, in a linear system with discrete measurements where you move and measure your position, averaging last 3 measurements, you are effectively lagging 1 step behind your actual position.
What Kalman filter does is that it estimates your position and then averages measurement with that, in essence bringing the value closer to where you are at the moment.
Having a delay in a feedback loop may cause oscillations. If you react way slower than you measure, you might not need Kalman filter. Proposed GPS example is relevant here, because position updates come in slowly.
Kalman filters are used for changing entities. Say you have a moving object. You could measure (with some errors) its position every second and use that as an estimate of where it is. Or use your position measurements in combination with an estimation of the object's speed that you can get from previous measurements and estimate where the object is. The second method is the Kalman filter.
To clarify, you mean completely independent speed measurements, right? As in speeds not derived from the position measurements? Otherwise, that feels like getting something from nothing.
I meant speeds derived from previous position measurements. Using those speeds is not getting something from nothing. It means using previous measurements to estimate where the object might be now then combine that estimate with current measurement to have the "best" estimate for its current position, for some definition of best.
I believe part of the cleverness of the Kalman Filter is that it works out the degree to which your measurements are correlated for you. I haven’t looked at it in a while, though.
Not your measurements. That correlation must be specified. It works out the correlations of your state (the thing you are estimating).
In the above example, their measurements are noisy mechanical states (position and momentum). However your measurements can be any (linear plus noise) function of the state, but you need the covariance of your sensor noise.
Here's another way to understand Kalman filters that doesn't require statistics, but does require some knowledge of feedback controllers. Consider a model of a system of the form
x'=Ax+Bu
y=Cx
Here, we have a linear system with a state variable `x`, system dynamics `A`, control `u`, control dynamics, `B`, and observation `y`. This states that we have linear dynamics, but we can only observe some of the state variables. For example, perhaps we can observe position, but not velocity or acceleration. At the same time, we want those other variables because we need them for a feedback control or some kind of observation. In order to do this, we use machinery similar to a feedback controller. Define an observer system:
xx' = Axx + Bu + L(y-Cxx)
xx(0) = xx0
Here, we have a new observer variable `xx` that we want to converge to the true state variable `x`. To do this, we have introduced a new matrix of gains called `L`, which we call the observer gain for a Luenberger observer. The reason that this system is useful is that if we consider the error `e=x-xx`, and subsitute this into the above equations, we get:
e' = x' - xx'
= ...
= (A-LC) e
From ODE theory, we know that `e` will converge to 0 if the real part of the eigenvalues of `A-LC` is negative. Hence, to facilitate this, we focus on trying to find an appropriate `L` that forces this condition. In order to do this, we note that the eigenvalues of `A-LC` are the same as it's transpose, `At-CtLt`. This leads to an optimization formulation:
min 0.5 <Qee,ee> + 0.5 <Ruu,uu> st ee' = Atee + Ctuu, ee(0) = ee(0)
The notation `<x,y>` means inner product and is simply `xt y`. Here, we're essentially looking at the adjoint equation with a new kind of control `uu` and a new adjoint state variable `ee`. This is a linear quadratic control on the adjoint of the error generated by the Luenberger observer. There is a mostly open choice for `Q` and `R` that we discuss below. Through a long sequence of derivations that is nearly identical to that of linear quadratic control, we find that we eventually solve for a matrix P in the system:
PAt + AP - PCt inv(R) C P = -Q
And then set the observer gain to `L=-PCt inv(R)`
Given this observer gain, we go back to our system above with `xx`, plug it in, and then solve. That system is constantly updated with observations of the original system in `y` and it produces an `xx` that converges very rapidly to the original state variable `x`. This gives us a way to view all of the state variables in the original equation even though we can't observe them all directly.
Now, to get to a full Kalman filter, there's a question of how to choose the matrices Q and R above. They could be almost anything. If you have an estimate of the what you believe the covariance of the errors are in the state and observations in the original equations, you get to a Kalman filter. Really, the derivation above is continuous, so it would be a Kalman-Bucy filter. Honestly, you don't need any statistics, though. Normally, just normalize the weights, so that all of the state variables are around the same size and then maybe adjust it on what you think is more important. It generally works just fine.
In short, a Kalman filter is a little machine that uses the same machinery as a feedback controller that rapidly converges to the state of a system of interest. This is useful because it's unlikely that all of those states can be observed directly. Due to how feedback controllers work, it is robust to errors and can handle variation or error in the observations.
So my very easy mental model of Kalman filter is: you feed model with some data (measurements, that are noisy, unstable) and some noise (usually Gaussian). Kalman filter actually kinda reduces precision by adding this noise, but you gain stability and less fluctuations and that's what makes it beautiful.
The Kalman filter does not require adding noise. The noise is a product of whatever phenomenon you are estimating. Normally we use noise in a model to cover up some behavior that is far to complicated to include in a more rigorous way. A classic example is the origin of Brownian motion. This probabilistic model came about to describe the motion of a large particle in a fluid of a smaller particles (e.g. a spec of dust in water). You could do physics and model every particle in the fluid and its collisions but that’s not tractable. Thus, the insight was to just model the large particle and turn the effects of collisions with the smaller ones into random disturbances.
This is a great post. Another aha! moment I had in understanding Kalman Filter came from a short note by John D. Cook (link: https://www.johndcook.com/blog/applied-kalman-filtering/) . A traditional calculus/differential equations based modelling of systems takes a view that there is no uncertainty in the data and everything is captured by system model. OTOH, data driven estimator will say that "it is all in the data", completely ignoring the model of the physical process that is generating the data. Beauty of Kalman filters is that it combines these two approaches.
My first real programming job was implementing a Kalman filter for GPS car applications in the 90s. GPS still had crippled accuracy for civilians back then (about 100m), which made it useless for turn-by-turn navigation in your car. But assuming that you have an accurate ___location at the start of the journey, and that the car stays on roads (and that the road maps are accurate), then we could filter it down to a decent reading.
I wonder how the intentional noise in GPS handled this. The point was for other forces to not have it for combat purposes. But, it seems like even a mediocre inertial navigation system, speedometer, or airspeed sensor combined with a Kalman filter and GPS would be very accurate.
Calculating position from velocity is very accurate, but suffers from drift. GPS does not. I wonder how accurate you can get with 1% variance velocity measurement and 100m variance GPS measurement on something doing 500km/h.
You can actually compute that! Model your agent’s dynamics (even just pick a double integrator, I.e., an agent where the acceleration is controlled, or something) and pick a value for the GPS covariance. You can compute the infinite-time limit of the variance of the Kalman filter exactly (your favorite numerical package has a function for solving algebraic Riccati equations which is all you need to do this). No simulated observations are necessary, just the dynamics and the GPS noise.
My guess is that you’ll find the filter converges to a distribution that has a decently large covariance if the military didn’t want other forces using it to make good decisions.
Thank you very much for the suggestion. There are enough words in it I don’t know that I’d need to spend a “new math ___domain to learn” token, and I’m out of those right now. Need to ship something to earn more. I will remain in a state of wonder for the time being.
This is great! Thanks for posting this. I'm going to have to definitely read through it more carefully later.
Also, this reminds of a tutorial that covered PID controllers. There were very good, animated JavaScript diagrams and plots that showed individually why each of the P, I, and D were needed in such a control system with the animations supporting the argument. I have tried finding it before, but I can't find it. It was posted on Hacker News. Does anyone know of one like this?
(1) Uh, just to mention, while sometimes some norms of math writing are not always made fully clear, in writing with math expressions it is important at least to define the symbols and also good to define the terms, motivate the discussion, give intuitive views, and give examples. In particular, (a) in physics it is standard to have some globally defined, understood, and accepted symbols, e.g., in F = ma, F for force, m for mass, and a for acceleration. (b) Uh, essentially that is not done in math and, instead, in each paper, discussion, text chapter, etc. really should start even from, say,
R -- the set of real numbers
n -- a positive whole number
e -- the base of the natural logarithm
etc. The paper that is the subject of this thread had some math notation without such definitions.
(2) My standard, quite clear, go-to source on Kalman filters is the section in (with TeX syntax)
David G.\ Luenberger,
{\it Optimization by Vector Space Methods,\/}
John Wiley and Sons, Inc., New York, 1969.\ \
That's a really good example of some really good math and writing of math.
Don't know anything formal in this field but this is a really great explainer. It's worth mentioning that if devices are miscalibrated such that the distribution is not centered around ground truth, this approach breaks down. Maybe there are more sophisticated ways to solve this that work well with the approach described. Not sure, but curious.
That doesn't pose an issue: the "observation model" of a Kalman filer allows for affine transformations (e.g. scale, rotation, and offset) between the sensor data and the "true" latent state that you wish to estimate. Even better, it can appropriately handle correlational structure between your sensors.
What the Kalman filter equations don't tell you, is how to estimate the parameters of such an observation model. You either have to write it down from first principles, estimate it from "fully observed" data (if you have such a luxury), estimate it (up to a rotation) using expectation maximization (EM), or guess.
My experience with kalman filters is that it is important to reflect that what you are measuring (the output) actually has one true value, and not a range of values.
Determining a position of something has one true value (unless you go into quantum mechanics) while many things may not, such as hardness of a piece of wood, color of a car temperature in a lake etc. All of these may vary depending on the position you measure.
You may be fooled by thinking that since the kalman filter outputs a value and an uncertainty that it actually incorporates the uncertainty in the output parameter. But it is not the same.
An example. If you have a measurement between 0 and 1 that is 0.2. And then get another measurement that is 0.8. The filter will be more certain that the output is 0.5, than with only one measuerement of 0.2. Instead of widening the uncertainty as the intuition may tell you.
More people should read this. A few weeks back some commenter on another thread was saying it makes perfect sense to break a sensor if you had more than one in order to get more certainty. Facepalm.
I've been wanting to make a deeper study of Kalman filters and related sensor fusion techniques for a while now, this looks like a decent resource but perhaps the hacker news horde has some even better suggestions for articles/textbooks/videos. If there is a particular piece of media that has really helped you grok this sort of thing please leave a link below!
For a robotics oriented, part theory part hands-on learning material on Kalman Filtering, I would suggest The Robot Mapping course from Cyrill Stachniss:
Unlike the OP article, it does make use of the math formalism for Kalman filters, but it is a relatively gentle introduction that does a very good job visualizing and explaining the intuition of each term. I have gotten positive feedback (no pun intended!) from interns or junior hires using this resource to familiarize themselves with the topic.
If you are making a deeper study and are ready to dive into a textbook that more thoroughly explores theory and application, there is a book by Gibbs[1] that I have used in the past and is well-regarded in some segments of industry that rely on these techniques for state estimation and GNC.
Nice typo but 100% agree, and if you dare tell anyone some freshman college math is necessary to understand some moderate-advanced subject, you're "gatekeeping" knowledge.
This is the thing with math and programming. The anti-math programmers are right: you don’t need much math to have a career as a programmer because lots of applications (e.g. web pages) don’t need much math.
However the more math you know, the more tools you have available to tackle problems and the more diverse the set of projects you can tackle are. You’ll spend less time reinventing the wheel and you don’t need to come up with shoddy heuristic solutions to problems because you can formally specify the problem and apply the right tool to solve it.
We're talking about a specific subset of people working in Computer Science.
No one right in their head would claim that knowledge of 'only' HS math is insufficient when it comes to having a successful career in Front-End, Back-End, Ops, Cloud etc and many other positions pertaining to web technologies. I have no idea what people in this sector do but I'd highly doubt advanced math knowledge would even benefit their career considering they are remaining there.
However the fact that there are people on this very platform saying that those who say Linear Algebra mastery is "gatekeeping" to people aspiring to become Machine Learning engineers demonstrates, at the very least, poor critical thinking on their part about the inefficiency of self - learning math only through 'pleasant' means such as:
I will only watch this cool video with animations,laying on my bed, about this hard math concept instead of grabbing some paper and pencil and carefully study some high quality textbook. I need to built intuition ASAP !!!
as well as showing that various different AI Bootcamp: Become an ML expert in only two weeks!! type of cashgrab scammers are sadly growing exp. in popularity.
There seems to be a lot of confusion about what Kalman filters are for in this thread. Perhaps that’s what happens when you seek a non-mathematical introduction to a mathematical topic but nevertheless I’m going to try and clear this up.
Specifically, the Kalman filter is a recursive way to estimate the state of a dynamical system. That is, specifically, the thing you want to estimate varies is a function of time. It doesn’t matter if that thing is the position and momentum of a robot or a stock price. What does matter are the following:
1. The dynamics are linear with additive Gaussian noise. That is, the next state is a linear function of the current state plus a sample from a Gaussian distribution. Optionally, if your system is controlled (i.e., there is a variable at each moment in time you can set exactly or at least with very high accuracy), the dynamics can include a linear function of that term as well.
2. The sensor feeding you data at each time step is a linear function of the state plus a second Gaussian noise variable independent of the first.
3. You know the dynamics and sensor specification. That is, you know the matrices specifying the linear functions as well as the mean and covariances of the noise models. For a mechanical system, this knowledge could be acquired using some combination of physics, controlled experimentation in a lab, reading data sheets, and good old fashioned tuning. For other systems, you apply a similarly appropriate methodology or guess.
4. The initial distribution of your state when you start running the filter is Gaussian and you know it’s mean and covariance (if you don’t know those, you can guess because given the filter runs for a long enough time they become irrelevant)
The Kalman filter takes in the parameters of the model described in (3) and gives you a new linear dynamical system that incorporates a new measurement at each time step and outputs the distribution of the current state. Since we assumed everything is linear and Gaussian, this will be a Gaussian distribution.
From a Bayesian perspective, the state estimate is the posterior distribution given your sensor data, model, and initial condition. From a frequentist / decision theory perspective, you get the least squares estimate of the state subject to the constraints imposed by your dynamics.
If your dynamics and sensor are not linear, you either need to linearize them, which produces the “extended Kalman filter” that gives you a “local” estimate of the state or you need another method. A common choice is a particle filter.
If your model is garbage, then model based methods like the Kalman filter will give you garbage. If your sensor is garbage (someone mentioned the case where it outputs a fixed number), the Kalman filter will just propagate your uncertainty about the initial condition through the dynamics.
If I'm understanding the article, then an aspect of variance is how much a sensor deviates from the wisdom of the crowds. So the 6.8 is harmless if it happens to be close to the right answer, and downweighted if not.
Which, if correct, means that Kalman filters are useful to detect malicious input to crowdsourced data like surveys, reviews, etc.
The kalman filter depends on having an accurate estimate of the variance to work, if you incorrectly weight an input as being far more trustworthy than it actually is then it won't work very well.
If your only sensor is broken and tells you nothing, then your system is unobservable with such a sensor and the only thing you can do is propagate your initial uncertainty through the dynamics. The Kalman filter will do exactly that in this case.
not if you incorrectly estimate that the broken sensor has zero variance. In that case the kalman filter will discard all the other information and trust it completely. I think that's the point the poster is making (the article uses a somewhat naive estimate of variance)
Yeah, actually working out the variance of your input is often the tricky bit (in a lot of cases it's just estimated once and then hardcoded, but more sophisticated approaches to try to estimate it dynamically are possible)
I've always wanted to write a numerical Kalman Filter code, to just plug a function and then get it's derivative. It would be slow but it would be easier to use
That's like asking if moving averages are useful. Of course they are, because they are components of basic signal processing. What you do with the data however is entirely on you. Most of these techniques are to improve the signal-to-noise ratio, but they won't help if you don't have an idea of what you are trying to look for in the first place.
More broadly, financial markets (in certain cases and depending on asset) can generally be modeled by a random walk, and that's something Kalman filters can help with as mentioned in another comment.
A couple more nits to pick:
"In essence, each of the 1000 passengers are doing thus: take the last known position (at the time before the present), add the velocity, and also, knowing that the wind and the water waves are going to slightly alter the course, add some random estimated fluctuations to it." The passengers don't add noise to their estimate. For the passengers, the expected value of the state at time k+1 given the state at time k will just be `position_k+1 = position_k + velocity * Delta_t`. The /true/ dynamics include noise, which is accounted for in the filter by adding to the estimated covariance. Your code doesn't break because you're taking a bunch of samples of the dynamics (by having 1000 passengers) and numerically taking the variance of the results. This is very different than what is usually done in practice.
It's a common misconception that GPS is affected by weather. It is not.
You are using a very non-standard definition of consistency. In estimation theory, having an estimator be consistent means that as you get more and more data, your estimator will converge to the true value.
Thank you for writing up the article and simplifying for a general audience, but there are a few misunderstandings here that seem to be causing problems. I am a graduate student in estimation theory and I would be happy to talk more about this if it would be helpful.