When are the distributions of latent variable in model used?

In SVI tutorial part 1, it says that the distributions for the the latent variable to be aligned in model and guide may be different. For example,

def model():
    pyro.sample("z_1", dist.Beta(torch.tensor(10.0), torch.tensor(10.0)))  # distribution 1

def guide():
    pyro.sample("z_1", dist.Beta(torch.tensor(15.0), torch.tensor(15.0)))  # distribution 2

My understanding is that, distribution 2 dist.Beta(torch.tensor(15.0), torch.tensor(15.0)) is the initial distribution for q(z) which will be iteratively tuned in the SVI optimization process. What is the use for distribution 1 dist.Beta(torch.tensor(10.0), torch.tensor(10.0))? Thanks.

SVI optimizes the ELBO objective and the first distribution corresponds to the p(x, z) term in the ELBO, while the second one corresponds to q(z), but both are needed to construct the objective. Put another way, the model is what you are really interested in, but since doing exact inference to compute the posterior p(z|x) isn’t feasible, you use a variational family q(z) to approximate this posterior and optimize the variational parameters for q to get close to p (i.e. minimize the KL divergence of p from q.)

I am not sure if Pyro’s tutorials are best suited to get started with Variational Inference, but you’ll find plenty of other good tutorials to get started (e.g. Eric Jang: A Beginner's Guide to Variational Methods: Mean-Field Approximation). I would suggest going through them to develop a good understanding and intuition for VI, and then come back to the introductory tutorials to understand how it works in Pyro.

1 Like