Vector Quantized Variational Autoencoders
IN PROGRESS
Discretization of the gradient? The big catch with a VQ-VAE is that it performs *just as well* as it's continuous counterparts.
Vector quantisation is essentially finding an optimal $\phi$ which best minimizes the squared loss between some vector $x$ you want to predict and $e_{i}$ template vectors. Our "codebook" is the set of all these template vectors $e_{i}$. However, this does not scale well if we desire accuracy as this requires large codebooks. This is where our variational autoencoder comes along. It's all about compressing the input to a lower-dimensional latent space. But, of course, we are compressing latent representations (not the input itself).
As output, the encoder gives us a continuous latent distribution $z_{e}(x)$ which we need to discretize. Usually, in vanilla VAEs, we would just sample from the encoder's outputs. These distributions are usually multivariate Gaussian with diagonal covariances. Our new approach, however, utilizes our $K$ embedding vectors which are learned during training, and stored within the codebook. We seek to find the embedding vector $k$ that is closest to $z_{e}(x)$ by minimizing the L2 loss, or:
\[k^{*}=\text{argmin}_{j}||z_{e}(x)-e_{j}||_{2}\]Suppose we have a latent embedding space $e\in R^{K\times D}$ where $K$ is the size of our latent space (also the number of our $e_{j}$ vectors!) and $D$ is the latent dimension of each embedding vector $e_{i}$. Then, we have that