Variational Autoencoders 2: Maths

Variational Autoencoders 1: Overview
Variational Autoencoders 2: Maths
Variational Autoencoders 3: Training, Inference and comparison with other models

Last time we saw the probability distribution of with a latent variable as follows:

$\displaystyle P(X) = \int P\left(X\vert z; \theta\right)P(z)dz$ (1)

and we said the key idea behind VAEs is to not sample from the whole distribution $P\left(z\right)$ , but actually from a simpler distribution $Q\left(z\vert X\right)$ . The reason is because most of will likely to give $P\left(X\vert z\right)$ close to zero, and therefore making little contribution to the estimation of $P\left(X\right)$ . Now if we sample $z \sim Q\left(z\vert X\right)$ , those values of will more likely to generate in the training set. Moreover, we hope that will has less modes than $P\left(z\right)$ , and therefore easier to sample from. The intuition of this is the locations of the modes of $Q\left(z\vert X\right)$ depends on , and this flexibility will compensate the limitation of the fact that $Q\left(z\vert X\right)$ is simpler than $P\left(z\right)$ .

But how $Q\left(z\vert X\right)$ can help with modelling $P\left(X\right)$ ? If is sampled from , then using we will get $E_{z \sim Q}P\left(X\vert z\right)$ . We will then need to show the relationship of this quantity with $P\left(X\right)$ , which is the actual quantity we want to estimate. The relationship between $E_{z \sim Q}P\left(X\vert z\right)$ and $P\left(X\right)$ is the backbone of VAEs.

We start with the KL divergence of $Q\left(z\vert X\right)$ and $P\left(z\vert X\right)$ :

$\mathcal{D}\left[Q\left(z\vert X\right) \vert\vert P\left(z\vert X\right)\right] = E_{z\sim Q}\left[\log Q\left(z\vert X\right) - log P\left(z\vert X\right)\right]$

The unknown quantity in this equation is $P\left(z\vert X\right)$ , but at least we can use Bayes rule for it:

$\mathcal{D}\left[Q\left(z\vert X\right) \vert\vert P\left(z\vert X\right)\right] = E_{z\sim Q}\left[\log Q\left(z\vert X\right) - log P\left(X\vert z\right) - \log P\left(z\right)\right] + \log P\left(X\right)$

Rearrange things a bit, and apply the definition of KL divergence between $Q\left(z\vert X\right)$ and $P\left(z\right)$ , we have:

$\log P\left(X\right) - \mathcal{D}\left[Q\left(z\vert X\right)\vert\vert P\left(z\vert X\right)\right] = E_{z\sim Q}\left[\log P\left(X\vert z\right)\right] - \mathcal{D}\left[Q\left(z\vert X\right) \vert\vert P\left(z\right)\right]$ (2)

If you forget everything, this formula is the thing you should remember. It is therefore important to understand what it means:

The left-hand-side is exactly what we want to optimize, plus an error term. The smaller this error term is, the better we are in mazimizing $P\left(X\right)$ . In other words, the left-hand-side is a lower-bound of what we want to optimize, hence the name variational (Bayesian).
If happens to be a differentiable function, the right-hand-side is something we can optimize with gradient descent (we will see how to do it later). Note that the right-hand-side happens to take the form of encoder and decoder, where encodes into , and then decodes to reconstruct , hence the name “Autoencoder”. However, VAEs don’t really belong to the family of Denoising and Sparse Autoencoders, although there are indeed some connections.
Note that $P\left(z\vert X\right)$ on the left hand side is something intractable. However, by maximizing the left hand side, we simultaneously minimize $\mathcal{D}\left[Q\left(z\vert X\right)\vert\vert P\left(z\vert X\right)\right]$ , and therefore pull $Q\left(z\vert X\right)$ closer to $P\left(z\vert X\right)$ . If we use a flexible model for , then we can use as an approximation for $P\left(z\vert X\right)$ . This is a nice side effect of the whole framework.

Actually the above maths existed way before VAEs. However the trick was to use a feedforward network for , which gave rise to VAEs several years ago.

Next time, we will see how to do that, and hopefully conclude this series. Then we can move on with something more interesting.

Variational Autoencoders 2: Maths

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112