Guide for gaussian process model

Hello all, there is something that I do not understand about gaussian process model from tutorial.
if

 gpr = gp.models.GPRegression(X, y, kernel, noise=torch.tensor(1.))

and

svi = SVI(gpr.model, gpr.guide, optim, loss=Trace_ELBO())

what is actually gpr.guide? From what I understand, guide is a distribution to approximate true posterior. In this context, what distribution is gpr.guide? Can I change the guide to another distribution?

Here is the complete code:

import torch

import pyro
import pyro.contrib.gp as gp
import pyro.distributions as dist
from pyro.infer import SVI, Trace_ELBO
from pyro.optim import Adam

kernel = gp.kernels.RBF(input_dim=1, variance=torch.tensor(5.), lengthscale=torch.tensor(10.))
gpr = gp.models.GPRegression(X, y, kernel, noise=torch.tensor(1.))

optim = Adam({"lr": 0.005})
svi = SVI(gpr.model, gpr.guide, optim, loss=Trace_ELBO())
losses = []
num_steps = 2500 if not smoke_test else 2
for i in range(num_steps):
    losses.append(svi.step())

Hi @yusri_dh, if you use GPRegression model, then its guide is indeed the AutoDelta guide which is used for MAP inference. If you don’t set prior to your kernel/model parameters, then this guide does nothing (because there is no latent variable in the model). If you set prior to a parameter then this guide will help you learn the MAP point of your parameter (posterior with Delta distribution).

In variational models, there are latent variables by defaults. In those cases, the guide will give your the posterior distributions of those latent variables.

If you don’t use SVI, then no need to use guide. Just do inference with your gpr.model. :wink:

1 Like

Thank you for the reply @fehiepsi but can you elaborate more detail. When I check the source code:

def guide(self):
        self.set_mode("guide")
        noise = self.get_param("noise")
        return noise 

But usually the guide is in the format like this:

def guide(data):
    # register a variational parameter with Pyro.
    alpha_q = pyro.param("alpha_q", torch.tensor(15.0),
                         constraint=constraints.positive)
    # sample latent_fairness from the distribution Exponential(alpha_q)
    pyro.sample("latent_fairness", dist.Exponential(alpha_q)

Why in the source code return noise (which I do not know what distribution it is) instead of pyro.sample(…dist.xxx)?

@yusri_dh All models in GP module are sub-classes of Parameterized class. Basically, it is a nn.Module with the ability to set priors to its parameters. Behind the scene, self.get_param("noise") will call pyro.sample("noise", dist.Delta(noise_MAP)) if you set some prior to noise. Otherwise, it will call pyro.param("noise").

Usually, there are many parameters in a GP model (especially when the kernel is a combination of many subkernels; which each of them has lengthscale, variance parameters). We don’t know a priori if a user wants to set some prior knowledge on some parameters or not. The Parameterized class helps us handle this complicated problem automatically.

2 Likes

Thank you so much, now I understand it clearly :smile:

Doesn’t this obscure the user. As everything is hidden behind the scene and makes life hard for hacking around the code [ learning curve is steeper].

I was expecting the fixed parameters and other things to be handled by pyro itself. Otherwise we run into issues when you want to combine GPs with some other models before and after GP in your chain.

Is there anything in the GP module that cannot be handled by

  • mark_params_active or mark_params_inactive
  • tag_params or untag_params

@SanketK Sorry for not making things clear enough in tutorials and docs. From next week, I will have time to write more on them.

About hiding things behind the scene, in my opinion, pretty much core things of a GP model lie in their class definitions. I implemented what I learned from literature, so I guess there are not many things hidden. In addition, priors and guides of kernels’ parameters are not topics of GP papers, so I think that we should not focus much on them.

To combine with other models, we have set_data method to set the input; and to get a latent output, we just use set_data with y = None. The example sv-dkl and the docs of set_data might be good examples for combining things.

To fix parameters, we have fix_param method. To unfix it, you can raise an issue or make a pull request to implement that functionality in Parameterized class (initially, I implemented it but did not use). :wink: Something like this

def unfix_param(self, param):
    if param in self._fixed_params:
        del self._fixed_params[param]

About tag_params, untag_params, I don’t use them so I don’t know how to work with them. On the other hand, they were removed in Pyro 0.2 I guess.

Do you have another idea on how to make guide for an unknown GP model? I would love to know to improve the current code.

My question was more basic one do we need the Parametrized class itself?

I am also working on adding multi-ouput kernels to these models. Would this of interest ? I can then raise a PR for this once I am done.

Other parts I can contribute to is Uncertainty Conditionals similar to ones on GPflow.

Is there a way we plan out this in detail and then I start pushing small parts into this eco system ?

2 Likes

I found that Parameterized class is convenient for time being. @eb8680 has some idea on utilizing AutoGuide module but I have not caught up with the current state yet.

About multi-output kernels and uncertainty conditionals, right now there is no plan on them yet. Because Pyro is developed with community oriented, please feel free to make PRs for subjects which you think is helpful for other users :wink:

I just wanted to check to see if you had had any success with the uncertain conditional like the implementation in GPFlow? Have you had any personal success?

I’d be curious to see how well it works in this setting where there are no explicit kernel expectations anywhere in this code; and they do the reparameterization trick instead.

Hi, may I ask do we actually do variation inference here in Gaussian Process, if so we shall get some uncertainty estimate about each hyperparameters shall we?

Yes, loc and scale of a foo parameter would be foo_loc and foo_scale. See e.g. Gaussian Process Latent Variable Model — Pyro Tutorials 1.9.0 documentation