Simple REINFORCE and Rao-Blackwellization Question


I’ve been reading some of the SVI literature to get a better intuition of best practices in SVI model implementation, and have some (simple?) questions. Not sure if anyone knows the answer to these?

  1. The literature seems to suggest that whenever the joint variational distribution of latents has even one latent variable that is discrete (i.e., not reparameterizable), then we have to use the REINFORCE/high variance ELBO gradient and are no longer using the reparameterization trick on even the Gaussian latents in the joint? Or am I misreading that? The chain rule doesn’t let us at least reparameterize just the continuous latents?

  2. Since Rao-Blackwellization seems to essentially take the general ELBO gradient formula and iteratively go through each element in the gradient to use only the subset of latents that depend on that element… then a fully mean-field factorization of the joint would seemingly take full advantage of this since none of the other latents would depend on any of the other latents (so you could remove the maximum amount of terms). Is that correct?

  3. Related to #2. However, Pyro’s Trace_ELBO (and possible TraceGraph_ELBO?) seems to require pyro.plate to use Rao-Blackwellization… but surely the latents can be completely mean-field factorized/independent without using plate, right? For example, two regression coefficients in the guide from separate univariate Gaussians (or even possibly a multivariate with diagonal covariance) are independent and that isn’t expressed with a pyro.plate. So would TraceGraph_ELBO be able to take advantage of that independency structure in the latents without pyro.plate (or does it still need pyro.plate)? Or am I misunderstanding how Rao-Blackwellization is working? (entirely possible, lol)

Thanks for any insight/help anyone can provide. I’m not the most adept at math and some of the literature I was reading was very math heavy, so not sure if I’ve been misreading/misunderstanding some things.

Regarding 1, yes reinforce and reparameterization can be freely mixed and pyro does this wherever possible.

I dont really understand 2/3 but note that rao blackwellization in the context of elbo gradient estimation depends on the grapical structure of the guide AND that of the model

Thanks for the clarification on #1. When I was looking through the source code, it did seem like the terms in the ELBO (and its surrogate for the gradient) were being constructed on a per-sample site basis that was applying reinforce to just the sample sites that needed it. However, I was getting confused from the literature (and a Pyro tutorial) because it always seemed to refer to a reinforce gradient vs. reparameterized gradient… and since the word ‘gradient’ usually implies the entire vector (i.e., for variational parameters of all distributions), it made me think that it was an all or nothing thing.

For #2 and #3, I think I got some more clarity after reading some of the tutorials more and doing some testing. I think the pyro.plate statement I read was referring to the model (essentially, that independencies in the latents don’t matter if observations in the model itself aren’t i.i.d., which seems true).

Also, it sounds like the Trace_ELBO implementation only implements partial Rao-Blackwellization that requires pyro.plates in the guide… but the TraceGraph_ELBO can identify more latent independencies in the guide (to remove more latent terms from the related element in the ELBO gradient) that doesn’t necessarily require guide plates. I may be misunderstanding the differences between the two but the latter does seem to perform noticeably better on some model/guide test runs I did (so I’d think it has to be identifying more latent independencies than the former).