Inference algorithm

Hi all,

I have a question which relates or adds to this post.

My question is: what is the inference algorithm that pyro use for Trace_ELBO()?

On this page, it says that the estimator uses [1] and [2]. However, in [4] it says that the primary inference algorithm pyro implements is [3]. In the SVI tutorial, there is a section here that references the reparameterisation trick used in [3].

From my understanding of [3], gradient of ELBO is w.r.t both \theta and variational parameter \phi. And in each iteration, ELBO increases by updating \theta, \phi. And in [1] and [2], gradient of ELBO is w.r.t the variational parameters \phi only – the gradient of ELBO w.r.t the model parameter \theta is not computed nor is \theta adjusted in each iteration.

Under the hood, what is the inference algorithm for Trace_ELBO()? Does using Trace_ELBO() involve estimating/computing \nabla_{\theta, \phi}\text{ELBO} or \nabla_{\phi}\text{ELBO}?

References:

[1] Automated Variational Inference in Probabilistic Programming,
David Wingate, Theo Weber

[2] Black Box Variational Inference,
Rajesh Ranganath, Sean Gerrish, David M. Blei

[3] Auto-Encoding Variational Bayes,
Diederik P Kingma, Max Welling

[4] Pyro: Deep Universal Probabilistic Programming
Eli Bingham, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, Noah D. Goodman

this is explained in this tutorial. basically pyro uses reparameterized gradients (like in [3]) when it can and backs off onto score function gradients (like in [2]) when it can’t. so it’s always computing stochastic estimates of ∇_{θϕ} ELBO but how it does so depends on whether the guide has discrete latent variables etc.

1 Like

sounds clear now. thanks @martinjankowiak

Hi @martinjankowiak, I have a followup question.

I’m running a GP model where the guide has continuous variables. Since there are no discrete variables, would the inference algorithm only be using [3] and not [2]?

@mcao probably. but hard to say for sure since you’re providing essentially zero details.

For instance, the sparse variational GP regression tutorial, which uses the class VariationalSparseGP from source code and an RBF kernel. The guide in the source code has no discrete variables. The instantiated parameters for sparse variational GP are self.u_loc, self.u_scale_tril , and self.lengthscale for the RBF kernel. Given that no discrete variables are involved, would the inference algorithm be from [3] only?