The toy example for understanding how to sample from the GP posterior predictive (if it’s even possible) is as follows. I want to fit a FBGPR model to data generated by an underlying function f: \mathbb{R} \to \mathbb{R}, f(x)=\sin (20 x)+2 \cos (14 x)-2 \sin (6 x), x \in [-1,1]. The data \mathcal{D} = \left\{x_{i},y_{i}\right\}_{i=1}^{n} generated is given by x_{i} = -1,-0.5,0,0.5, 1 and y_{i} = f(x_{i}).

The fully Bayesian Gaussian process regression model is:

Let us say that I now have a new set of points X^{*} given by torch.linspace(-1,1,200). Is it possible to sample a function observed on X^{*} from the posterior predictive of the GP using the data set using the pyro.contrib.gp module?

I just want to sample from the GP predictive posterior.
Mathematically, I want to sample from p\left(f^* \mid X^*, \mathcal{D} \right) = \int p\left(f^*,\theta \mid X^*, \mathcal{D} \right) \; \mathrm{d} \theta, where \theta = \left(\sigma_{l}^{2},\sigma_{v}^{2}\right)^{\top}. In other words: I want to sample a function at a new set of points X^* given by torch.linspace(-1,1,200).

The code in your reply is for Spare GP and it only gives the expected mean and covariance.

I could of course use torch.distribution.MultrivariateNormal. But the problem is that it perceives the covariance matrix to not be positive semi-definite.

Also, qua the documentation it’s not clear how one would actually sample from the GP posterior using Pyro’s GP module. I’m quite confused by it.

I see. So here you’re assuming to have a collection/distribution of kernel’s hyperparameters that you want to marginalize out. Could you let me know what form the posterior is? Are they a collection of samples, or a transformed diagonal normal distribution etc.?

Either way, I think that you can just simply define a new predictive model and use Predictive, like

def new_model(X_new):
# The following 4 lines could be optional, depending on how you define GP or perform Bayesian inference
param1 = pyro.sample(...)
param2 = pyro.sample(...)
kernel = Kernel(param1, param2)
gp = GP(kernel, X, y)
loc, scale = gp(X_new, full_cov=False) # set False to avoid singular issues
return pyro.sample("y_new", dist.Normal(loc, scale))

I have tried to work on it for a couple of days without look.

“So here you’re assuming to have a collection/distribution of kernel’s hyperparameters that you want to marginalize out.”
Yes, I want to marginalize out the kernel’s hyperparameter.
I have a collected of posterior samples for the hyperparameters \theta = \left(\sigma_{l}^{2},\sigma_{v}^{2}\right)^{\top} which I got via NUTS, and I have these posterior samples of the hyperparameters stored in a PyTorch tensor. Also, I do not see how the posterior hyperparameters samples are included in your code. Doesn’t one need to tell Pyro were the posterior hyperparameters samples are?

“Could you let me know what form the posterior is?”
I’m not sure which posterior you are referring to and I’m not sure what you mean by “form”. Also, remember that there are the GP posterior and the posterior over the hyperparameters. What I’m trying to do is to find the GP posterior predictive distribution using the posterior hyperparameters samples (I have 1000 such samples) which I got NUTS.

The Predictive api requires posterior samples to be a dictionary, with keys are variable names and values are corresponding tensors. There are some other notions of posteriors like custom guide, auto guides, etc.

In my code, you can replace param1, param2 by hyperparameters sigma/theta etc.

As mentioned in the code, the details depend on how you define the model and perform inference. I just posted a pseudocode to use with Predictive, not NUTS.

To draw samples from p(f^* | \theta, X^*,\mathcal{D})q(\theta | \mathcal{D}), we need to construct the corresponding model model. p(f^* | \theta, X^*,\mathcal{D}) is defined by the Normal distribution with parameters obtained from the gp(X^*) call. That’s what new_model is doing. You can define sample statements for q(\theta | \mathcal{D}), then provide corresponding posterior samples for \theta through the posterior_samples argument of Predictive.

Alright, I think I have done it. But the output y_new_samples = predictive(X_new)["y_new"], where X_new = torch.tensor(-1,1,100), is not what I expected. I get a PyTorch tensor with shape (1000,100). The size of the first dimension comes from there being 1000 posterior samples of the hyperparameters and 100 comes from the length of X_new.

Alright. I thought that it will do the integration itself. But it’s not really a problem in anyway, you just have to do the integration yourself., i.e. take the mean over the hyperparameters.