Sampling From the GP Posterior Predictive in a FBGPR Model Using pyro.contrib.gp

Sunny · April 4, 2023, 11:24am

I have looked through Gaussian Processes — Pyro Tutorials and Gaussian Processes — Pyro documentation.

The toy example for understanding how to sample from the GP posterior predictive (if it’s even possible) is as follows. I want to fit a FBGPR model to data generated by an underlying function f: \mathbb{R} \to \mathbb{R}, f(x)=\sin (20 x)+2 \cos (14 x)-2 \sin (6 x), x \in [-1,1]. The data \mathcal{D} = \left\{x_{i},y_{i}\right\}_{i=1}^{n} generated is given by x_{i} = -1,-0.5,0,0.5, 1 and y_{i} = f(x_{i}).

The fully Bayesian Gaussian process regression model is:

\sigma_{l}^{2} \sim \mathcal{LN}\left(-1,1\right), \sigma_{v}^{2} \sim \mathcal{LN}\left(0,2\right)

k\left(x,x^{\prime}\right) = \sigma_{v}^{2} \exp\left(-\frac{1}{2\sigma_{l}^{2}}\left\|x-x^{\prime}\right\|^{2}\right)

f \sim \mathcal{GP}\left(0,K\right).

Let us say that I now have a new set of points X^{*} given by torch.linspace(-1,1,200). Is it possible to sample a function observed on X^{*} from the posterior predictive of the GP using the data set using the pyro.contrib.gp module?

fehiepsi · April 4, 2023, 4:28pm

I’m not sure what you’re looking for, probably this code in the tutorial is what you need?

        Xtest = torch.linspace(-0.5, 5.5, n_test)  # test inputs
        # compute predictive mean and variance
        with torch.no_grad():
            if type(model) == gp.models.VariationalSparseGP:
                mean, cov = model(Xtest, full_cov=True)
            else:
                mean, cov = model(Xtest, full_cov=True, noiseless=False)

Sunny · April 4, 2023, 8:28pm

I just want to sample from the GP predictive posterior.
Mathematically, I want to sample from p\left(f^* \mid X^*, \mathcal{D} \right) = \int p\left(f^*,\theta \mid X^*, \mathcal{D} \right) \; \mathrm{d} \theta, where \theta = \left(\sigma_{l}^{2},\sigma_{v}^{2}\right)^{\top}. In other words: I want to sample a function at a new set of points X^* given by torch.linspace(-1,1,200).

The code in your reply is for Spare GP and it only gives the expected mean and covariance.

Let me know if it’s still unclear.

Best regards,
Sunny

Sunny · April 4, 2023, 9:59pm

I could of course use torch.distribution.MultrivariateNormal. But the problem is that it perceives the covariance matrix to not be positive semi-definite.

Also, qua the documentation it’s not clear how one would actually sample from the GP posterior using Pyro’s GP module. I’m quite confused by it.

fehiepsi · April 6, 2023, 10:18am

I see. So here you’re assuming to have a collection/distribution of kernel’s hyperparameters that you want to marginalize out. Could you let me know what form the posterior is? Are they a collection of samples, or a transformed diagonal normal distribution etc.?

Either way, I think that you can just simply define a new predictive model and use Predictive, like

def new_model(X_new):
    # The following 4 lines could be optional, depending on how you define GP or perform Bayesian inference
    param1 = pyro.sample(...)
    param2 = pyro.sample(...)
    kernel = Kernel(param1, param2)
    gp = GP(kernel, X, y)
    loc, scale = gp(X_new, full_cov=False)  # set False to avoid singular issues
    return pyro.sample("y_new", dist.Normal(loc, scale))

Sunny · April 8, 2023, 7:52pm

I have tried to work on it for a couple of days without look.

“So here you’re assuming to have a collection/distribution of kernel’s hyperparameters that you want to marginalize out.”
Yes, I want to marginalize out the kernel’s hyperparameter.
I have a collected of posterior samples for the hyperparameters \theta = \left(\sigma_{l}^{2},\sigma_{v}^{2}\right)^{\top} which I got via NUTS, and I have these posterior samples of the hyperparameters stored in a PyTorch tensor. Also, I do not see how the posterior hyperparameters samples are included in your code. Doesn’t one need to tell Pyro were the posterior hyperparameters samples are?

“Could you let me know what form the posterior is?”
I’m not sure which posterior you are referring to and I’m not sure what you mean by “form”. Also, remember that there are the GP posterior and the posterior over the hyperparameters. What I’m trying to do is to find the GP posterior predictive distribution using the posterior hyperparameters samples (I have 1000 such samples) which I got NUTS.

fehiepsi · April 9, 2023, 2:56am

The Predictive api requires posterior samples to be a dictionary, with keys are variable names and values are corresponding tensors. There are some other notions of posteriors like custom guide, auto guides, etc.

In my code, you can replace param1, param2 by hyperparameters sigma/theta etc.

Sunny · April 9, 2023, 9:12pm

If I use pyro.sample instead of pyro.nn.PyroSample, NUTS will not run.

fehiepsi · April 10, 2023, 1:18am

As mentioned in the code, the details depend on how you define the model and perform inference. I just posted a pseudocode to use with Predictive, not NUTS.

To draw samples from p(f^* | \theta, X^*,\mathcal{D})q(\theta | \mathcal{D}), we need to construct the corresponding model model. p(f^* | \theta, X^*,\mathcal{D}) is defined by the Normal distribution with parameters obtained from the gp(X^*) call. That’s what new_model is doing. You can define sample statements for q(\theta | \mathcal{D}), then provide corresponding posterior samples for \theta through the posterior_samples argument of Predictive.

Sunny · April 11, 2023, 11:05pm

Alright, I think I have done it. But the output y_new_samples = predictive(X_new)["y_new"], where X_new = torch.tensor(-1,1,100), is not what I expected. I get a PyTorch tensor with shape (1000,100). The size of the first dimension comes from there being 1000 posterior samples of the hyperparameters and 100 comes from the length of X_new.

Is this what you will expect?

fehiepsi · April 12, 2023, 9:09pm

Yes, that is expected to me.

Sunny · April 12, 2023, 11:31pm

Alright. I thought that it will do the integration itself. But it’s not really a problem in anyway, you just have to do the integration yourself., i.e. take the mean over the hyperparameters.