Question about SparseGPRegression Model in Pyro GP

Dear Pyro Users

I am trying to understand how Variational Sparse GP is implemented in Pyro GP. For me to better ask question, I recap the main framework of VSGP here. The ELBO to be maximized in this case takes the following form

sELBO = E_{q(f)}[log p(y| f)] - KL[q(u) || p(u)]

where q(u) is the approximated posterior and q(f) = int p(f|u)q(u)du where p(f|u) is GP conditional. This form of ELBO is not in the general form of

gELBO = E_{q(z)}[log p(y|z)] - KL[q(z)||p(z)]

where z is the latent variable. In the above GP case, z = {f, u}.

My understanding is Trace_ELBO calculates gELBO with latent z to be defined in both model and guide. Clearly the sELBO for VSGP is not the form of gELBO.

What confuses me is that how the Variational Sparse GP regression model is defined, see
pyro.contrib.gp.models.sgpr — Pyro documentation,

  1. According to this definition, we see the only latent variable is u. Hence when this model is combined with the loss TRACE_ELBO in svi, it is doing something like gELBO with z = u. Then where is the latent variable f?

  2. As it is not clear how pyro calculates ELBO (too complicated to trace it out), not sure whether the above VariationalSparseGP model is doing the right thing.

  3. Also really dont know what self._load_pyro_samples() is about??

Any ideas and advices appreciated

Thanks

J

Hi @junbin.gao, there are two classes: SparseGPRegression and VariationalSparseGP. The later class with Gaussian likelihood is a particular case of the former with approx='VFE'. Here, I assume that you were talking about the later class.

In Pyro, VSGP implements sELBO with, in general, E_{q(f)}[log p(y| f)] is approximated using a sample f drawn from q(f). In case likelihood is Gaussian, we can compute E_{q(f)}[log p(y| f)] exactly. If you use Trace_ELBO, the term KL[q(u) || p(u)] will be approximated using a sample u ~ q(u). If you use TraceMeanField_ELBO, that KL term will be calculated analytically. In other words, if you use Gaussian likelihood and TraceMeanField_ELBO, the objective ELBO will be exactly computed (i.e. no MC approximation for both KL and likelihood).

When you set priors for some hyperparameters of the kernel (or any other parameters such as noise, weight/bias of mean functions,…), you need to define the guides for them. The call self._load_pyro_samples() just simply load the (auto)guide of those hyperparameters.

Dear Du

Thank you so much for your detailed answer. Fully understood now. This is what I figured out after submitting my questions. That is, the latent variable f is sampled according to u according to p(f|u), so the first term E_{q(f)}[log p(y|f)] = \int p(f|u)q(u)\log p(y|f) is calculated as \log p(y|f) with f as the mean of p(f|u) for a given sampled u. That means corresponding to each u we are using a single f to simulate E_{p(f|u)}[\cdot]. Is the accuracy good enough? Or any way to use more samples of f for a given sample u?

Further questions regarding self._load_pyro_samples():

(1) if inside guide (or model), I have explicit sample statements for those hyperparameters, do I still need call self._load_pyro_samples()

(2) Does this mean self._load_pyro_samples() can appear in either model or guide, but not in both? Or use it in guide, but in model with explicit sample statements.

Another silly question on SparseGPRegression class: Why the guide is empty?

Many thanks

J

To sample more f, I think you can subclass the likelihood and draw more sample from this line. In the case of Gaussian and Poisson likelihoods, the likelihood term can be computed exactly using, e.g., LogNormal-Poisson distribution) so you don’t need to draw Monte Carlo samples.

(1) if inside guide (or model), I have explicit sample statements for those hyperparameters, do I still need call self._load_pyro_samples()

You don’t need to call it if you don’t set any prior using set_prior method.

Does this mean self._load_pyro_samples() can appear in either model or guide, but not in both?

I can appear in both. You don’t need to use any explicit sample statement in model. All you need to do is to call self.foo for a PyroSample parameter foo. self._load_pyro_samples() just calls self.foos for all PyroSample parameters foos. I guess this notebook, which implements semiparametric GP models, will be helpful for you to see the interaction between those PyroSample and pyro.sample statements in Pyro.

on SparseGPRegression class: Why the guide is empty?

Because there is no latent variable in those models. As mentioned in my last comment, ELBO can be computed exactly in those cases.

Dear Du

Thank you so much. I think I am getting there :slight_smile: Have to admit that the learning curve of Pyro is so steep. Have not touched poutine yet :frowning:

  1. Would it be possible to only set part of a posterior covariance as parameters otherr fixed e.g. to 0?

  2. If my likelihood needs a mask (masking out some terms), how shall I set this mask to a dist/likelihood/sample statement?

J.

Hi @junbin.gao, could you be more explicit on which posterior that you mentioned, which latent variables in your model, and which guide that you used?

how shall I set this mask

You can use either mask handler or mask method.

Thanks. Here is my problem. I are trying a GPLVM model where X is latent “inputs”. I wish to have a self-defined posterior q(X) such that the precision matrix of X (assuming X is in simple feature) is NxN with special structures, for example, diagonal elelements and one of off-diagonal elements, (3,4)-element are variables to be learned while other off-diagonal elements are zero. How to set such a guide?

Another question is about your example here gplvm_predict.ipynb · GitHub

Three lines are
gplvm = gp.models.SparseGPRegression(X, y, kernel, Xu, noise=torch.tensor(0.01), jitter=1e-5)
gplvm.X = pyro.nn.PyroSample(dist.Normal(X_prior_mean, 0.1).to_event())
gplvm.autoguide(“X”, dist.Normal)

  1. In the second line, if I use dist.MultivariateNormal( …) with a precision matrix, then gplvm.X shape will be dxN, how this shape is aligned to the model input Nxd

  2. In third like, how shall I change to make a special guide dist.MultivariateNormal( …) with my given a special precision matrix with only some elements to be learned.

It seems to me that GPLVM is using a single kernel for all the output dimension?

J.

If you are familiar with Pyro primitives such as pyro.sample, pyro.param, then you don’t need to use pyro.nn.PyroSample. Just simply set a prior for X in a model and a customized guide for X in a guide, for example,

def model(...):
    X = pyro.sample(..., dist.Normal(...))
    gplvm.X = X

def guide(...):
    some_entries = pyro.param(...)
    prec = construct_precision_matrix_from(some_entries)
    X = pyro.sample(..., dist.MVN(..., precision_matrix=prec))
    gplvm.X = X
    ...

You can also use PyroParam, PyroSample if you want. Please see how it works in this tutorial.

how this shape is aligned to the model input Nxd

I am not sure what you need. Is X = X.t() or something like that enough for your purpose? Please see documentation to set correct shapes for X and y.

GPLVM is using a single kernel for all the output dimension

Yes, a single kernel is used across all GP processes. You might want to use VSGP with latent_shape=y.shape[:-1] if you want to condition on separated sets of inducing points for each GP process.

Did not find MVN. Now I have other torch ways to not update some parameters. However when I can use MultivariateNormal, all the scale_tril elements become zero?

J

Sorry, I was too quick. By MVN, I meant MultivariateNormal distribution. :smiley:

all the scale_tril elements become zero?

This is strange. Probably your precision matrix is singular after your modification. Pls make sure that construct_precision_matrix_from(some_entries) gives a symmetric positive definite matrix.

Dear Du

Can I share my program code and data on dropbox with you so that you may have a look why X_loc does not change at all? What is your email address?

Hi @junbin.gao, I would prefer to have an open discussion. There are already many discussions about GPLVM in this forum. Each topic has different ways to deal with but the core principle should be the same. It would be nice to have some replicable code with synthesis data so that other Pyro users can play with it or can join the discussion.

That is okay. I simplified my program. X_loc does not change. Not sure where I got wrong.

J.

================ Program============================

import torch
import numpy as np 

from torch.nn import Parameter
 
import matplotlib.pyplot as plt  
import warnings
warnings.filterwarnings("ignore")
 
import pyro
import pyro.contrib.gp as gp
import pyro.ops.stats as stats
import pyro.distributions as dist

pyro.enable_validation(True)       # can help with debugging
np.random.seed(0)
torch.manual_seed(0)
pyro.set_rng_seed(0)

Y = torch.randn(2708, 1043)    
X_prior_mean =  torch.randn(2708, 32)  
 
pyro.clear_param_store()

latentDim = X_prior_mean.shape[1]
 
kernel = gp.kernels.RBF(input_dim=latentDim,
                        lengthscale=torch.ones(latentDim))
 
X_prior_mean =  X_prior_mean.to(torch.float64) 
X = Parameter(X_prior_mean.clone())

Xu = stats.resample(X_prior_mean.clone(), 32)

gplvm = gp.models.SparseGPRegression(X, Y.T, 
            kernel, Xu, noise=torch.tensor(0.0001), jitter=1e-5)

# gplvm.X  = pyro.nn.PyroSample(dist.MultivariateNormal(
#     X_prior_mean.new_zeros(X_prior_mean.shape[1], X_prior_mean.shape[0]),
#     precision_matrix = torch.from_numpy(L) ).to_event())
# gplvm.X = pyro.nn.PyroSample(dist.MultivariateNormal(
#     X_prior_mean.t(), precision_matrix = 0.1 * torch.from_numpy(L) ))  
gplvm.X = pyro.nn.PyroSample(dist.Normal(X_prior_mean, 0.1).to_event())

#gplvm.autoguide("X", dist.Normal)    # This one does not work either
gplvm.autoguide("X", dist.MultivariateNormal)

gplvm.mode = "guide"
# Printing a sample of latent variable from unlearned posterior,
# i.e., the guide (Pyro's term)
print(gplvm.X_loc)    
print(gplvm.X_scale_tril_unconstrained)  # All zeros 

#note that training is expected to take a minute or so
gplvm.mode = "model"
gplvm.set_data(X_prior_mean, Y.T)
# Please change num_step to 1000
losses = gp.util.train(gplvm, num_steps=10)

gplvm.mode = "guide"
print(gplvm.X_loc)  #  This does not change
print(gplvm.X_scale_tril_unconstrained) # All zeros no change
plt.figure()
plt.plot(losses)
plt.show()

#gplvm.mode = "guide"
#X = gplvm.X  # draw a sample from the guide of the variable X

#Please the mean of posterior Gaussian is X_loc,
# so we shall extract them as the latent variables (i.e., Data's embedding)
# This is the mean of posterior Gaussian. We need transpose it if
# using MultivariateNormal posterior or prior 
X = gplvm.X_loc.detach().numpy()  #.T     
plt.figure()
plt.scatter(X[:,0],X[:,1])
# This visualization may not be good, because it is
# only 2 dimension of latentDim (=35)
plt.show()    

will set gplvm.X = X_prior_mean. So X is not a PyroSample anymore.

How may I modify the statement, that is how to initialize the latent values?

I delete
gplvm.set_data(X, Y.T)

then X_loc starts changing. But I got another shape issue. Will send you the program

J.

I want to use MVN for prior, that is, a Gaussian over the features (crossing data) of latent X, however this makes a shape issue. In fact, I also want posterior in the similar shape. Can you help sort it out? -J

import torch
import numpy as np 
from torch.nn import Parameter
import matplotlib.pyplot as plt  
import warnings
warnings.filterwarnings("ignore")
import pyro
import pyro.contrib.gp as gp
import pyro.ops.stats as stats
import pyro.distributions as dist
pyro.enable_validation(True)       # can help with debugging
np.random.seed(0)
torch.manual_seed(0)
pyro.set_rng_seed(0)

Y = torch.randn(2708, 1043)    
X_prior_mean =  torch.randn(2708, 32)  
 
pyro.clear_param_store()

latentDim = X_prior_mean.shape[1]
 
kernel = gp.kernels.RBF(input_dim=latentDim,
                        lengthscale=torch.ones(latentDim))
 
X_prior_mean =  X_prior_mean.to(torch.float64) 
X = Parameter(X_prior_mean.clone())

Xu = stats.resample(X_prior_mean.clone(), 35)

gplvm = gp.models.SparseGPRegression(X, Y.T, 
            kernel, Xu, noise=torch.tensor(0.0001), jitter=1e-5)

gplvm.X  = pyro.nn.PyroSample(dist.MultivariateNormal(
    X_prior_mean.new_zeros(X_prior_mean.shape[1], X_prior_mean.shape[0]), scale_tril = torch.eye(2708).to(torch.float64)  ).to_event())
#This makes X in shape [32, 2708], while it should be [2708, 32], how to reshape this?
 
#gplvm.X = pyro.nn.PyroSample(dist.Normal(X_prior_mean, 0.1).to_event())   # This line works

#gplvm.autoguide("X", dist.Normal)     
gplvm.autoguide("X", dist.MultivariateNormal)

gplvm.mode = "guide"
print(gplvm.X_loc)    
print(gplvm.X_scale_tril_unconstrained)  # All zeros 

#note that training is expected to take a minute or so
gplvm.mode = "model"
#gplvm.set_data(X, Y.T)
losses = gp.util.train(gplvm, num_steps=10)

gplvm.mode = "guide"
print(gplvm.X_loc)  #  This does not change
print(gplvm.X_scale_tril_unconstrained) # All zeros no change
plt.figure()
plt.plot(losses)
plt.show()

#gplvm.mode = "guide"
#X = gplvm.X  # draw a sample from the guide of the variable X
X = gplvm.X_loc.detach().numpy().T     
plt.figure()
plt.scatter(X[:,0],X[:,1])
#This visualization may not be good, because it is only 2 dimension of latentDim (=35)
plt.show()

You can change X to X.t() by creating an attribute gplvm.X_t = PyroSample(...) then training the model/guide

def model(...):
    gplvm.set_data(gplvm.X_t.t(), gplvm.y)
    gplvm.model(...)

def guide(...):
    gplvm.set_data(gplvm.X_t.t(), gplvm.y)
    gplvm.guide(...)

You can also just using pyro.sample, which is more flexible because you wanted a structured covariance matrix.

def model(...):
    X_t = pyro.sample("X_t", dist.Normal(...))
    ...

def guide(...):
    loc = pyro.param(...)
    custom_cov = ...
    X_t = pyro.sample("X_t", dist.MultivariateNormal(loc, custom_cov))
    ...

p/s: Could you reformat your post to make it easier to follow? You can mimic how I formatted the code in your previous comment.

Not sure where to put these def model(...) and def guide(...). Can you help give an example or make a little changes to my code?

model, guide are used to do SVI inference. You can read SVI tutorial on how to do SVI in Pyro and this forum topic on how to use it to do mini-batching GPLVM.

Thank Du. No clue yet. Have to give it up. Any fully completed example?