KL divergence in approximate inference


I am new to Pyro and was checking the documentation for examples. I was going over approximate inference techniques, more precisely stochastic variatonal inference using the reparametrization trick. I have three questions:

  • How do we specify the prior distribution of the latent variable in the graphical model?

  • How does pyro knows if the KL divergence term that appears on the ELBO can be computed in closed form? I do not see in the documentation from the ELBO variants, how does pyro compute the ELBO, i.e stochastic optimization of the joint distribution plus entropy of the variatonal distribution, stochastic optimization of the conditional distribution plus KL divergence between variatonal and prior distributions… Any information on how one can specify these things?

  • If we have a hierarchical latent variable model, let’s say z2->z1-> x. How do we specify this hierarchy and the topological order?. What if the joint model is p(x|z_1,z_2)p(z_1|z_2)p(z_2). I understand that the model that parametrizes p(x|z_1,z_2) has z_1,z_2 as inputs but. How does Pyro track these things? Shall I provide any further information or Pyro does everything for me?

Thank you.