GPLVM - priors and constraints for inputs, X

Hello,

First of all I wanted to thank the people on the Pyro team for creating the GP contrib section. I find it nice and really easy to use. It sets a nice standard for good software to allow researchers to explore GP algorithms - something we desperate need in the GP community.

Problem

I am trying to verify some things that have been said in this paper (Damianou 2014) for my research. They claim that we can use the BGPLVM model in the case of modeling uncertain inputs. So, instead of having deterministic inputs X, the inputs come from a distribution e.g. N(mu_x, Sigma_x).

In the paper, they claim that we can do one of two things:

• Set a prior distribution to X (the model(?) as you call it in Pyro)
• Set a prior distribution to the variational parameter q(X) (or the guide as you call it in Pyro)

I like the example you put in the tutorial (from this paper GrandPrix 2019) as it is something very similar in the sense that you are explicitly putting priors on X. The problem Iâ€™m doing is slightly different as Iâ€™m not reducing the dimensionality but I think itâ€™s a similar situation regardless; I imagine that I just need to finess the priors and which parameters require a grad function. However, Iâ€™m having trouble understanding how we are setting the priors and what exactly they represent within our GPLVM model.

Code - Setting Priors

Continuing from the tutorial where you set a prior to the `gplvm` class (`line [6]`),

``````#...
# we use `.to_event()` to tell Pyro that the prior distribution for X has no batch_shape
gplvm.set_prior("X", dist.Normal(X_prior_mean, 0.1).to_event())
gplvm.autoguide("X", dist.Normal)
``````

the `autoguide` puzzles me a bit as I cannot really understand where to access the parameters. I could be wrong but doing a simple inspection of the model attributes

``````gplvm.mode = 'model'
model_X_loc = gplvm.X_loc.cpu().detach().numpy()
model_X_scale_unconstrained = gplvm.X_scale_unconstrained.cpu().detach().numpy()
``````

and similarly a simple inspection of the guide attributes

``````gplvm.mode = 'guide'
guide_X_loc = gplvm.X_loc.cpu().detach().numpy()
guide_X_scale_unconstrained = gplvm.X_scale_unconstrained.cpu().detach().numpy()
``````

gives the exact same output

``````assert(model_X_loc.all() == guide_X_loc.all())
assert(model_X_scale_unconstrained.all() == guide_X_scale_unconstrained.all())
``````

So my intuition is that I just donâ€™t understand where the parameters are stored in this model nor the guide because if I look at the min, mean and max of the values for the model and the guide

``````print(model_X_loc.min(),
model_X_loc.mean(),
model_X_loc.max())
print(model_X_scale_unconstrained.min(),
model_X_scale_unconstrained.mean(),
model_X_scale_unconstrained.max())
``````

they return

``````0.0 0.38005337 1.0
0.0 0.0 0.0
``````

which doesnâ€™t make sense to me because we explicitly set the prior to have a 0.1 variance.

Question

Would anyone be able to help me and give me some more intuition or perhaps point me in the direction of some tutorials about how I can do the following:

• Set a prior distribution to the parameter X
• Constrain the mean and/or variance of the distribution of the prior for X (e.g. positive, zero_grad)
• Set a prior distribution to the parameter q (the guide)
• Constrain the mean and/or variance of the distribution of the prior for the guide (e.g. positive, zero_grad)

J. Emmanuel Johnson

Hi @jejjohnson, I am really happy to see your interest in the gp module! Iâ€™ll try to answer your questions but if there is any point which is not clear, please let me know. There might be something wrong with my understanding so further discussions would be very helpful for me.

First of all, if you set prior distribution to `X` with

``````gplvm.set_prior("X", dist.Normal(X_prior_mean, 0.1).to_event())
``````

then mean of prior is `X_prior_mean` and `scale` is 0.1 (variance 0.01). Unless you want to learn priorâ€™s mean/variance, these tensors will always be constant. Under the hood, `dist.Normal(X_prior_mean, 0.1).to_event()` will be stored.

When you call `gplvm.autoguide("X", dist.Normal)`, then the module will create variational parameters `X_loc` and `X_scale` for you. However, we need `X_scale` to be positive, so under the hood, the â€śroot/rawâ€ť parameters are `X_loc` and `X_scale_unconstrained`. These parameters will be used to generate a sample `X` from the guide distribution `dist.Normal(X_loc, X_scale)`. They play no role for `prior`.

If you want to constrain `X_loc` to positive, you might call: `gplvm.set_constraint('X_loc', constraint.positive)`. Then root/raw parameter of `X_loc` will be `X_loc_unconstrained`.

1 Like

Hello,

Apologies for the late reply. I believe you answered my question. I would just like to clarify a few things for any readers (and more myself if anything):

• Set a prior distribution to X - this is done as in the tutorial with `.set_prior("X")`â€¦ and these parameters are fixed.
• The X prior is already constrained and fixed with the `set_prior("X")` call. Is there a way to unfix this and let these parameters be learned? (Unpractical I know but itâ€™s nice to know for the future)
• Define a distribution q - as in the tutorial, call `gplvm.autoguide("X")`â€¦ This will create the parameters `X_loc` and `X_scale` which are learned.
• We constrain q automatically via the `gplvm.autoguide("X")`. The `X_loc` and `X_scale_unconstrained` are created and we are free to modify them as we see fit. However, these parameters are learned. It is possible that we could fix them by setting them with a `X_loc = Parameter(torch.Tensor(0.1), requires_grad=False)`, for example, correct?

If everything I said above is correct then I think I finally understand how everything works together. Thank you again.

Best,
J. Emmanuel Johnson

1 Like

Hi @jejjohnson, I think that you get everything right. Thanks for your detailed clarifications! For your questions,

It is possible that we could fix them by setting them with a `X_loc = Parameter(torch.Tensor(0.1), requires_grad=False)`

To fix them, I use `gplvm.X_loc.requires_grad_(False)` but it is equivalent to what you did. For example, in this benchmark test, I fixed inducing points from learning.

Is there a way to unfix this and let these parameters be learned?

Sure, you can do it but you need a bit more effort (not much though):

``````class LearnedPriorGP(gp.parameterized.Parameterized):
def __init__(self, gplvm):
self.gplvm = gplvm
self.prior_loc = nn.Parameter(...)
self.prior_scale = nn.Parameter(...)
self.set_constraints("prior_scale", constraints.positive)

def model(self):
self.mode = "model"
X = pyro.sample("X", dist.Normal(self.prior_loc, self.prior_scale))
self.gplvm.set_data(X, y)
self.gplvm.model()

def guide(self):
gplvm.guide()
``````

and use this class instead of gplvm for inference.

Hope that it helps! The design pattern I have in mind when making gp module is to make it modular (like PyTorch nn.Module) and flexible, so it is easy to combine parts together as a probabilistic model (rather focusing in analytic derivations as in other frameworks). Please let me know if something does not work.

1 Like

Hey @fehiepsi,

Thank you for the pseudocode and confirmation of clarification. I believe I have plenty to continue conducting my experiments regarding uncertain inputs for GPLVMs.

Once again, thank you for the replies and thank you for all your work on the contrib library. I appreciate it even more with every additional element of understanding!

Thanks!
Emmanuel

1 Like

Hello @fehiepsi,

So Iâ€™ve been working in the uncertain inputs problem that was mentioned in this thread for a while now and I have a question about what the inference method is actually doing for the latent variables in the GPLVM model.

If you recall I was using the GPLVM tutorial. In my problem I am assuming that I know the noise in my inputs. So to do that, I put a prior on my X where I fix the `X_prior_mean` and put my known `X_prior_scale`. Then I set the guide function to be a Normal distribution with a `X_mean` and diagonal `X_scale` term. The mean is fixed because I assume my observations are true. My project has been to try experiments where I use different combinations of fixing or unfixing the scale term for both the prior and the guide, e.g. `X_scale`= fixed, `X_scale`=not fixed, etc. In all of my experiments Iâ€™m using the `TraceMeanField_ELBO` inference method.

So behind the scenes, the prior for X is fixed. So the only role it plays in the elbo minimization is in the KL-Divergence term between the prior , `p(X)` and variational params, `q(X)`. However, the variational parameters (the guide) is the term that is being changed. So Iâ€™d like to know if it is simply a reparameterization where for `q(X), X = X_loc + X_scale @ Normal(0,I)`. Then update the variational parameters `X_loc` and `X_scale` just like any other parameter within our model? I just wanted to confirm that that is the case. Itâ€™s actually not very common to see this formulation for the latent variable priors in the GP literature. Itâ€™s very common for the kernel parameters, `f` and the inducing points `Z`. But I havenâ€™t seen this formulation specifically for the latent variables `X`.

Best,
Emmanuel

P.S. If anyone is interesting in seeing my initial results theyâ€™re more than welcome to look at the google colab notebook I created.

Hi @jejjohnson, IIUC then your question is how to fix `X_mean`? The parameter names of the guide is `X_loc` and `X_scale_unconstrained`. To not optimize `X_loc`, I think you can use this method (probably using `module.named_parameters()` to do filtering based on `name` instead of using `module.parameters()`). This also works for `X_scale_unconstrained`. Otherwise, IIRC you can also use

``````del model.X_loc
model.X_loc = some_fixed_tensor
``````

if it is simply a reparameterization where for `q(X), X = X_loc + X_scale @ Normal(0,I)`

Actually, KL is computed directly from `p(x)` and `Normal(X_loc, X_scale)`, so KL is a function of X_loc and X_scale (which is what we want to optimize).

1 Like

Hi,

So actually the second point that you mentioned was my question. You compute the KLD between `q(X)` and `p(X)` and you compute the likelihood using the `q(X)` params just like in the standard VI literature.
Glad that it helps! If you want to make sure that KL is computed analytically (instead of stochastic), then you can add a print statement `print(name)` at this line.