Building gaussian mixture with Relaxed Bernoulli/Categorical

Aha, that’s a tricky question:

The “C.2 WHAT YOU MIGHT RELAX AND WHY” section(page 15.) from the Concrete paper https://arxiv.org/pdf/1611.00712.pdf actually discussed different choices of model/prior. (relaxed or not). Their final choice is to use relaxed Bernoulli/Categorical on both model side and guide side. (Meanwhile, they use a trick to acquire a stable evaluation of the kl term in the ELBO).

In the Gumbel-softmax paper, they use un-relaxed prior and relaxed posterior.

This does not seem to be a big deal when training a VAE with discrete latent space, the network will converge anyway. (also VAE does not has a ground truth to recover). However, when it comes to SVI (non-amortized) with Pyro, it seems that the choice should be made carefully.