Hi, I’m new to all things Bayesian and I’ve been reading through the SVI tutorials. I had a few questions regarding gradient estimators:
It says that removing non-downstream variables is rao-blackwellization. Doesnt the rao-blackwell thm state that any estimator conditioned on a sufficient statistic equal or better than non-conditional estimator? How does removing/integrating variables out correspond to this? I cant connect these two concepts in my head.
In the part about the score function estimators, if
qis nonreparameterizable won’t taking
grad(log(q(z)))be equally problematic, since you can’t differentiate
q(z)with respect to its parameters? in the last equation in this section, you still differentiate
phibut I thought the point was that
fwas non differentiable.
Thanks for your help.