In SVI tutorial part 1, it says that the distributions for the the latent variable to be aligned in model
and guide
may be different. For example,
def model():
pyro.sample("z_1", dist.Beta(torch.tensor(10.0), torch.tensor(10.0))) # distribution 1
def guide():
pyro.sample("z_1", dist.Beta(torch.tensor(15.0), torch.tensor(15.0))) # distribution 2
My understanding is that, distribution 2 dist.Beta(torch.tensor(15.0), torch.tensor(15.0))
is the initial distribution for q(z)
which will be iteratively tuned in the SVI optimization process. What is the use for distribution 1 dist.Beta(torch.tensor(10.0), torch.tensor(10.0))
? Thanks.
SVI optimizes the ELBO objective and the first distribution corresponds to the p(x, z)
term in the ELBO, while the second one corresponds to q(z)
, but both are needed to construct the objective. Put another way, the model is what you are really interested in, but since doing exact inference to compute the posterior p(z|x)
isn’t feasible, you use a variational family q(z)
to approximate this posterior and optimize the variational parameters for q
to get close to p
(i.e. minimize the KL divergence of p
from q
.)
I am not sure if Pyro’s tutorials are best suited to get started with Variational Inference, but you’ll find plenty of other good tutorials to get started (e.g. Eric Jang: A Beginner's Guide to Variational Methods: Mean-Field Approximation). I would suggest going through them to develop a good understanding and intuition for VI, and then come back to the introductory tutorials to understand how it works in Pyro.
1 Like