Understanding the VariationalSparseGP function

atamm · May 22, 2020, 6:19pm

Hi all,

I have been using Pyro to write a custom Gaussian Process model and also looking into existing code for guidance. There is one question I would like to ask about the VariationalSparseGP function in order to better understand it. In that function, the posterior p(u, f|y) ~ p(y|f)p(f|u)p(u) is approximated by q(u, f) = p(f|u)q(u). Due to this setup, the evidence lower bound only contains an expectation of log(p(y|f)) with respect to q(f), and an expectation of log(q(u)/p(u)) with respect to q(u), as the p(f|u) terms cancel out in the logarithm. If I look at the code, I see that the ‘guide()’ samples ‘u’ from q(u), and ‘model()’ defines the random variables ‘u’ and ‘y’. The ‘model()’ does not define the random variable ‘f’, but instead directly computes the mean and variance of the marginal distribution q(f). It then plugs the mean into the likelihood p(y|f) and adds the diagonal elements of the covariance marix of q(f) to the existing variance of the likelihood. I wonder if this approach is equivalent to sampling ‘f’ from q(f) and then evaluating the gradient of the log-likelihood? If ‘f’ is not sampled, and only the variance of its diagonal elements is used, I wonder if this could lead to some loss of information (loss of information about the covariance of individual points in f)? I am quite new to variational inference and would be grateful to hear your thoughts!

Many thanks,
Andres

martinjankowiak · May 22, 2020, 10:09pm

you could sample u ~ q(u) and then sample f ~ p(f|u) and the ELBO and its gradients would be the same in expectation but your ELBO estimator (and its gradients) would be higher variance. you could still solve the optimization problem, but now things would be more stochastic and you’d probably reach a somewhat worse approximate optimum