Hi,
I’ve been reading some of the SVI literature to get a better intuition of best practices in SVI model implementation, and have some (simple?) questions. Not sure if anyone knows the answer to these?
-
The literature seems to suggest that whenever the joint variational distribution of latents has even one latent variable that is discrete (i.e., not reparameterizable), then we have to use the REINFORCE/high variance ELBO gradient and are no longer using the reparameterization trick on even the Gaussian latents in the joint? Or am I misreading that? The chain rule doesn’t let us at least reparameterize just the continuous latents?
-
Since Rao-Blackwellization seems to essentially take the general ELBO gradient formula and iteratively go through each element in the gradient to use only the subset of latents that depend on that element… then a fully mean-field factorization of the joint would seemingly take full advantage of this since none of the other latents would depend on any of the other latents (so you could remove the maximum amount of terms). Is that correct?
-
Related to #2. However, Pyro’s Trace_ELBO (and possible TraceGraph_ELBO?) seems to require pyro.plate to use Rao-Blackwellization… but surely the latents can be completely mean-field factorized/independent without using plate, right? For example, two regression coefficients in the guide from separate univariate Gaussians (or even possibly a multivariate with diagonal covariance) are independent and that isn’t expressed with a pyro.plate. So would TraceGraph_ELBO be able to take advantage of that independency structure in the latents without pyro.plate (or does it still need pyro.plate)? Or am I misunderstanding how Rao-Blackwellization is working? (entirely possible, lol)
Thanks for any insight/help anyone can provide. I’m not the most adept at math and some of the literature I was reading was very math heavy, so not sure if I’ve been misreading/misunderstanding some things.