Gradient estimator questions


Hi, I’m new to all things Bayesian and I’ve been reading through the SVI tutorials. I had a few questions regarding gradient estimators:

  1. It says that removing non-downstream variables is rao-blackwellization. Doesnt the rao-blackwell thm state that any estimator conditioned on a sufficient statistic equal or better than non-conditional estimator? How does removing/integrating variables out correspond to this? I cant connect these two concepts in my head.

  2. In the part about the score function estimators, if q is nonreparameterizable won’t taking grad(log(q(z))) be equally problematic, since you can’t differentiate q(z) with respect to its parameters? in the last equation in this section, you still differentiate f wrt phi but I thought the point was that f was non differentiable.

Thanks for your help.