Kalman filters (and how they relate to HMMs)

Kalman filters are insanely popular in many engineering fields, especially those involve sensors and motion tracking. Consider how to design a radar system to track military aircrafts (or warships, submarines, … for that matter), how to track people or vehicles in a video stream, how to predict location of a vehicle carrying a GPS sensor… In all these cases, some (advanced) variation of Kalman filter is probably what you would need.

Learning and teaching Kalman filters is therefore quite challenging, not only because of the mere complexity of the algorithms, but also because there are many variations of them.

With a Computer Science background, I encountered Kalman filters several years ago, but never managed to put them into the global context of the field. I had chances to look at them again recently, and rediscovered yet another way to present and explain Kalman filters. It made a lot of sense to me, and hopefully it does to you too.

Note that there are a lot of details missing from this post (if you are building a radar system to track military aircrafts, look somewhere else!). I was just unhappy to see many introductory material on Kalman filters are either too engineering or too simplified. I want something more Machine Learning-friendly, so this is my attempt.

Let’s say you want to track an enemy’s aircraft. All you have is a lame radar (bought from Russia probably) which, when oriented properly, will give you a 3-tuple of range, angle and angular velocity Image may be NSFW.
Clik here to view. $[r \;\phi\;\dot{\phi}]^{T}$ of the aircraft. This vector is called the observation Image may be NSFW.
Clik here to view. $\mathbf{z}_k$ (subscript Image may be NSFW.
Clik here to view. because it depends on time). The actual position of the aircraft, though, is a vector in cartesian coordinates Image may be NSFW.
Clik here to view. $\mathbf{x}_k = [x_1\;x_2\;x_3]^{T}$ . Since it is an enemy’s aircraft, you can only observe Image may be NSFW.
Clik here to view. $\mathbf{z}_k$ , and you want to track the state vector Image may be NSFW.
Clik here to view. $\mathbf{x}_k$ over time, every time you receive a measurement Image may be NSFW.
Clik here to view. $\mathbf{z}_k$ from the radar.

Visualised as a Bayesian network, it looks like this:

Image may be NSFW.
Clik here to view. Untitled Diagram (1)

With all the Markov properties hold, i.e. Image may be NSFW.
Clik here to view. $\mathbf{x}_k$ only depends on Image may be NSFW.
Clik here to view. $\mathbf{x}_{k-1}$ and Image may be NSFW.
Clik here to view. $\mathbf{z}_k$ only depends on Image may be NSFW.
Clik here to view. $\mathbf{x}_k$ , does this look familiar?

Yes, indeed. This looks a lot like the beloved Hidden Markov Models, where Image may be NSFW.
Clik here to view. $p\left(\mathbf{x}_k\vert\mathbf{x}_{k-1}\right)$ is called the transition model, and Image may be NSFW.
Clik here to view. $p\left(\mathbf{z}_k\vert\mathbf{x}_k\right)$ is called the observation model.

However, in HMMs, the states Image may be NSFW.
Clik here to view. $\mathbf{x}_k$ are discrete variables whose values can only be taken from a fixed set of states. In our case, the state vector Image may be NSFW.
Clik here to view. $\mathbf{x}_k$ is continuous, therefore HMM algorithms cannot be used in this setting.

1. The tracking problem

Before we see how we can solve this problem, let’s first formally define it. Let Image may be NSFW.
Clik here to view. $\mathbf{Z}_{1:k} = \left\{\mathbf{z}_1, ..., \mathbf{z}_k\right\}$ be the set of observations up to timestep Image may be NSFW.
Clik here to view. t_k .

The whole point of tracking is to infer the value of the state at the current timestep Image may be NSFW.
Clik here to view. $\mathbf{x}_k$ , given the sequence of observation Image may be NSFW.
Clik here to view. $\mathbf{Z}_{1:k}$ . In other words, being a good Bayesian, that is to keep track the conditional probability distribution Image may be NSFW.
Clik here to view. $p\left(\mathbf{x}_k\vert\mathbf{Z}_{1:k}\right)$ .

Let’s assume somehow we know the distribution Image may be NSFW.
Clik here to view. $p\left(\mathbf{x}_{k-1}\vert\mathbf{Z}_{1:k-1}\right)$ . Now at timestep Image may be NSFW.
Clik here to view. t_k , we receive the observation Image may be NSFW.
Clik here to view. $\mathbf{z}_k$ , and want to infer the distribution Image may be NSFW.
Clik here to view. $p\left(\mathbf{x}_k\vert\mathbf{Z}_{1:k}\right)$ . How do we go about this?

Inspired by some similar idea in the field (e.g. the EM algorithm), we can do this in 2 step. The first step is to predict the value of Image may be NSFW.
Clik here to view. $\mathbf{x}_k$ given the previous history Image may be NSFW.
Clik here to view. $\mathbf{Z}_{1:k-1}$ . Using the law of total probability for the conditional distribution of continuous variables, we can write:

Image may be NSFW.
Clik here to view. $\begin{array}{rl} p\left(\mathbf{x}_k\vert \mathbf{Z}_{1:k-1}\right)& = \int p\left(\mathbf{x}_k\vert\mathbf{x}_{k-1}, \mathbf{Z}_{1:k-1}\right)p\left(\mathbf{x}_{k-1}\vert\mathbf{Z}_{1:k-1}\right)d\mathbf{x}_{k-1} \\ & = \int p\left(\mathbf{x}_k\vert\mathbf{x}_{k-1}\right)p\left(\mathbf{x}_{k-1}\vert\mathbf{Z}_{1:k-1}\right)d\mathbf{x}_{k-1}\end{array}$

The last equation was based on the Markov assumption that Image may be NSFW.
Clik here to view. $\mathbf{x}_k$ is independent of everything else given Image may be NSFW.
Clik here to view. $\mathbf{x}_{k-1}$ . Now we have all the distribution on the right (which are the transition model and the previous posterior at timestep Image may be NSFW.
Clik here to view. $t_{k-1}$ ), we can just integrate over all possible values of Image may be NSFW.
Clik here to view. $\mathbf{x}_{k-1}$ to get the distribution on the left!

Assume that is doable (although maybe not trivial), we now have the prior distribution Image may be NSFW.
Clik here to view. $p\left(\mathbf{x}_k\vert \mathbf{Z}_{1:k-1}\right)$ . With the new observation Image may be NSFW.
Clik here to view. $\mathbf{z}_k$ , using the Bayes rule, we can update our estimation as follow:

Image may be NSFW.
Clik here to view. $\begin{array}{rl} p\left(\mathbf{x}_k\vert \mathbf{Z}_{1:k}\right) & = \displaystyle \frac{ p\left(\mathbf{z}_k\vert\mathbf{x}_k\right)p\left(\mathbf{x}_k\vert \mathbf{Z}_{1:k-1}\right)}{p\left(\mathbf{z}_k\vert \mathbf{Z}_{1:k-1}\right)} \\ & = \displaystyle \frac{p\left(\mathbf{z}_k\vert\mathbf{x}_k\right)p\left(\mathbf{x}_k\vert \mathbf{Z}_{1:k-1}\right)}{\int p\left(\mathbf{z}_k\vert\mathbf{x'}_k\right)p\left(\mathbf{x'}_k\vert \mathbf{Z}_{1:k-1}\right)d\mathbf{x'}_k}\end{array}$

Here we use the observation model and the prior computed in the previous step to estimate the posterior Image may be NSFW.
Clik here to view. $p\left(\mathbf{x}_k\vert \mathbf{Z}_{1:k}\right)$ at time Image may be NSFW.
Clik here to view. t_k . With this, we are ready to process the next observation.

Unfortunately, both the predict and update equations are intractable in the general case. We therefore need to find some ways to approximate them.

2. Kalman filter

Enter Kalman filter. In Kalman filter, we make the following assumption:

The prior distribution is Gaussian: Image may be NSFW.
Clik here to view. $p\left(\mathbf{x}_{k-1}\vert \mathbf{Z}_{1:k-1}\right)=\mathcal{N}\left(\hat{\mathbf{x}}_{k-1},\mathbf{P}_{k-1}\right)$ , parameterized by the state mean vector Image may be NSFW.
Clik here to view. $\hat{\mathbf{x}}_{k-1}$ and covariance matrix Image may be NSFW.
Clik here to view. $\mathbf{P}_{k-1}$ .
Predict and update the states now mean we need to keep track the value of Image may be NSFW.
Clik here to view. $\hat{\mathbf{x}}_{k-1}$ and Image may be NSFW.
Clik here to view. $\mathbf{P}_{k-1}$ over time.
The transition model is linear: Image may be NSFW.
Clik here to view. $\mathbf{x}_k = f\left(\mathbf{x}_{k-1}\right) = \mathbf{F}_k\mathbf{x}_{k-1} + \mathbf{B}_k\mathbf{u}_k + \mathbf{q}_k$
where Image may be NSFW.
Clik here to view. $\mathbf{u}_k$ is the control vector (zero for many application), Image may be NSFW.
Clik here to view. $\mathbf{q}_k \sim \mathcal{N}\left(0, \mathbf{Q}_k\right)$ is the process noise vector drawn from a zero-mean Gaussian distribution with covariance Image may be NSFW.
Clik here to view. $\mathbf{Q}_k$ .
Practically, this means the transition distribution is Gaussian: Image may be NSFW.
Clik here to view. $p\left(\mathbf{x}_k\vert\mathbf{x}_{k-1}\right)=\mathcal{N}\left(\mathbf{F}_k\mathbf{x}_k,\mathbf{Q}_k\right)$
Note that although Image may be NSFW.
Clik here to view. $\mathbf{F}_k, \mathbf{B}_k, \mathbf{Q}_k$ are all subscripted t, but they don’t need to be kept track over the course of tracking. Often they only depend on the time difference between the latest timestep and the previous one.
The observation model is linear: Image may be NSFW.
Clik here to view. $\mathbf{z}_k = h\left(\mathbf{x}_k\right) = \mathbf{H}_k\mathbf{x}_k + \mathbf{r}_k$
where Image may be NSFW.
Clik here to view. $\mathbf{r}_k \sim \mathcal{N}\left(0, \mathbf{R}_k\right)$ is the observation noise.
Similarly, this means the observation distribution is Gaussian: Image may be NSFW.
Clik here to view. $p\left(\mathbf{z}_k\vert\mathbf{x}_k\right) = \mathcal{N}\left(\mathbf{H}_k\mathbf{x}_k,\mathbf{R}_k\right)$

Those assumption are based on the fact that a linear transformation of a Gaussian is also a Gaussian.

Now since everything is Gaussian, the equations become tractable and have closed-form solutions. The derivation of Kalman filter can be found everywhere on the Internet, and only involves a few matrix operations.

In fact this is why it is so popular. At the end of the day, if the state vector is low-dimensional, it can be implemented very efficiently on low-power devices, while giving good solutions once the assumptions above are satisfied.

3. Extended Kalman Filter and Unscented Kalman Filter

Unfortunately, life is not linear. For instance, the location of an aircraft is almost certainly not a linear function of its location in previous timestep (unless it is flying at a constant velocity, which is the best way to be shot down). For those applications, the assumption of Kalman filter doesn’t hold anymore.

There are a few way to deal with this scenario, where the transition function Image may be NSFW.
Clik here to view. $f\left(\mathbf{x}_{k-1}\right)$ and the observation function Image may be NSFW.
Clik here to view. $h\left(\mathbf{x}_k\right)$ are non-linear.

First, Extended Kalman Filter uses the Taylor Series expansion to approximate the function Image may be NSFW.
Clik here to view. $\mathbf{z}_t = h\left(\mathbf{x}_k\right) \approx h\left(\mathbf{x}_{k-1}\right) + \frac{\partial h}{\partial x}\left(\mathbf{x}_{k-1}\right)\left(\mathbf{x}_{k}-\mathbf{x}_{k-1}\right)$ , and this is a linear function again. We then need to compute the Jacobian matrix at every timestep, then apply the vanilla Kalman Filter as usual.

However, computing the Jacobian matrix is messy, and the Taylor Series expansion might not very well-behaved for complicated process. In those case, Unscented Kalman Filter (UKF) might be used.

UKF is based on the observation that we only need to track the mean vector Image may be NSFW.
Clik here to view. $\hat{\mathbf{x}_{k}}$ and the covariance matrix Image may be NSFW.
Clik here to view. $\mathbf{P}_k$ over time. So for every non-linear transformation, we can just take a sample of Image may be NSFW.
Clik here to view. 2n + 1 points (called the sigma points, where Image may be NSFW.
Clik here to view. is the dimensionality of the domain) and transform them by our function. We then can estimate the new mean and covariance based on the transformed sigma points.

It looks like this:

Image may be NSFW.
Clik here to view. Untitled Diagram (2)

The details on how to choose the sigma points and how to compute the transformed mean and variance can be found on the Internet. We are not going to repeat them here because they are boring, but the idea now gets closer to sampling methods.

We will have another post for Particle Filtering, with all the sampling methods. This is where things get serious.

Image may be NSFW.
Clik here to view.

Kalman filters (and how they relate to HMMs)

1. The tracking problem

2. Kalman filter

3. Extended Kalman Filter and Unscented Kalman Filter

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...