Gradient estimator questions

Hi, I’m new to all things Bayesian and I’ve been reading through the SVI tutorials. I had a few questions regarding gradient estimators:

  1. It says that removing non-downstream variables is rao-blackwellization. Doesnt the rao-blackwell thm state that any estimator conditioned on a sufficient statistic equal or better than non-conditional estimator? How does removing/integrating variables out correspond to this? I cant connect these two concepts in my head.

  2. In the part about the score function estimators, if q is nonreparameterizable won’t taking grad(log(q(z))) be equally problematic, since you can’t differentiate q(z) with respect to its parameters? in the last equation in this section, you still differentiate f wrt phi but I thought the point was that f was non differentiable.

Thanks for your help.

  1. See the discussion and references in section 3.1 of the Black Box Variational Inference paper cited in the tutorial.
  2. We always assume that the density q is differentiable wrt phi. Reparametrization is concerned with breaking a sampler for q into a stochastic part that does not depend on phi and a differentiable deterministic function of that part that does depend on phi.