Black box variational inference does not require a model to be differentiable since the gradients of ELBO only requires the gradient of log(q(z)). I think it uses the log(p(z,x)/ q(z)) [instantaneous ELBO] as a kind of a reward to weight gradients. However, the objective I see in the pyro SVI part III tutorial involves a gradient of the instantaneous ELBO as well. I checked the proof in the BBVI paper and found that the expectation of this instantaneous ELBO with respect to the guide is zero. Hence the gradient does not have this term at all. Is this properly not used in pyro’s SVI? Am I missing something?
Edit: Is it that the loss which pyro implements cover both cases of differentiable and non differentiable model? If the model is non-differentiable, it automatically defaults to zero. How does the differentiability affect the variance of gradients/ performance?
if by black-box variational inference you mean the paper of the same name, then yes, they do not require differentiable models. but that is because they assume the model has no parameters to be learned only latent random variables. they also use the score function gradient estimator. however, ELBO in pyro supports/constructs various gradient estimators (score-function but also pathwise gradient estimators) depending on the model/guide at hand; pyro also allows the model to depend on parameters (like in a VAE, where the model parameters are the neural network parameters of the decoder). so the ELBO implementation in pyro is more general than ‘bbvi’. re: performance, empirically people find that pathwise (a.k.a. reparameterized) gradients have much lower variance than the score-function alternative.
Thanks! Just to be sure can I safely say this. If I’m not learning any parameters in the model, the SVI objective degenerates to the BBVI objective in the paper.
no that’s not quite right. BBVI assumes the score-function gradient estimator. when available pyro will instead use the pathwise gradient estimator (since it has lower variance).