GPLVM - priors and constraints for inputs, X

Hello,

First of all I wanted to thank the people on the Pyro team for creating the GP contrib section. I find it nice and really easy to use. It sets a nice standard for good software to allow researchers to explore GP algorithms - something we desperate need in the GP community.

Problem

I am trying to verify some things that have been said in this paper (Damianou 2014) for my research. They claim that we can use the BGPLVM model in the case of modeling uncertain inputs. So, instead of having deterministic inputs X, the inputs come from a distribution e.g. N(mu_x, Sigma_x).

In the paper, they claim that we can do one of two things:

  • Set a prior distribution to X (the model(?) as you call it in Pyro)
  • Set a prior distribution to the variational parameter q(X) (or the guide as you call it in Pyro)

I like the example you put in the tutorial (from this paper GrandPrix 2019) as it is something very similar in the sense that you are explicitly putting priors on X. The problem I’m doing is slightly different as I’m not reducing the dimensionality but I think it’s a similar situation regardless; I imagine that I just need to finess the priors and which parameters require a grad function. However, I’m having trouble understanding how we are setting the priors and what exactly they represent within our GPLVM model.

Code - Setting Priors

Continuing from the tutorial where you set a prior to the gplvm class (line [6]),

#...
# we use `.to_event()` to tell Pyro that the prior distribution for X has no batch_shape
gplvm.set_prior("X", dist.Normal(X_prior_mean, 0.1).to_event())
gplvm.autoguide("X", dist.Normal)

the autoguide puzzles me a bit as I cannot really understand where to access the parameters. I could be wrong but doing a simple inspection of the model attributes

gplvm.mode = 'model'
model_X_loc = gplvm.X_loc.cpu().detach().numpy()
model_X_scale_unconstrained = gplvm.X_scale_unconstrained.cpu().detach().numpy()

and similarly a simple inspection of the guide attributes

gplvm.mode = 'guide'
guide_X_loc = gplvm.X_loc.cpu().detach().numpy()
guide_X_scale_unconstrained = gplvm.X_scale_unconstrained.cpu().detach().numpy()

gives the exact same output

assert(model_X_loc.all() == guide_X_loc.all())
assert(model_X_scale_unconstrained.all() == guide_X_scale_unconstrained.all())

So my intuition is that I just don’t understand where the parameters are stored in this model nor the guide because if I look at the min, mean and max of the values for the model and the guide

print(model_X_loc.min(), 
      model_X_loc.mean(), 
      model_X_loc.max())
print(model_X_scale_unconstrained.min(), 
      model_X_scale_unconstrained.mean(), 
      model_X_scale_unconstrained.max())

they return

0.0 0.38005337 1.0
0.0 0.0 0.0

which doesn’t make sense to me because we explicitly set the prior to have a 0.1 variance.

Question

Would anyone be able to help me and give me some more intuition or perhaps point me in the direction of some tutorials about how I can do the following:

  • Set a prior distribution to the parameter X
  • Constrain the mean and/or variance of the distribution of the prior for X (e.g. positive, zero_grad)
  • Set a prior distribution to the parameter q (the guide)
  • Constrain the mean and/or variance of the distribution of the prior for the guide (e.g. positive, zero_grad)

Thank you in advanced.
J. Emmanuel Johnson

Hi @jejjohnson, I am really happy to see your interest in the gp module! I’ll try to answer your questions but if there is any point which is not clear, please let me know. There might be something wrong with my understanding so further discussions would be very helpful for me. :slight_smile:

First of all, if you set prior distribution to X with

gplvm.set_prior("X", dist.Normal(X_prior_mean, 0.1).to_event())

then mean of prior is X_prior_mean and scale is 0.1 (variance 0.01). Unless you want to learn prior’s mean/variance, these tensors will always be constant. Under the hood, dist.Normal(X_prior_mean, 0.1).to_event() will be stored.

When you call gplvm.autoguide("X", dist.Normal), then the module will create variational parameters X_loc and X_scale for you. However, we need X_scale to be positive, so under the hood, the “root/raw” parameters are X_loc and X_scale_unconstrained. These parameters will be used to generate a sample X from the guide distribution dist.Normal(X_loc, X_scale). They play no role for prior.

If you want to constrain X_loc to positive, you might call: gplvm.set_constraint('X_loc', constraint.positive). Then root/raw parameter of X_loc will be X_loc_unconstrained.

1 Like

Hello,

Apologies for the late reply. I believe you answered my question. I would just like to clarify a few things for any readers (and more myself if anything):

  • Set a prior distribution to X - this is done as in the tutorial with .set_prior("X")… and these parameters are fixed.
  • The X prior is already constrained and fixed with the set_prior("X") call. Is there a way to unfix this and let these parameters be learned? (Unpractical I know but it’s nice to know for the future)
  • Define a distribution q - as in the tutorial, call gplvm.autoguide("X")… This will create the parameters X_loc and X_scale which are learned.
  • We constrain q automatically via the gplvm.autoguide("X"). The X_loc and X_scale_unconstrained are created and we are free to modify them as we see fit. However, these parameters are learned. It is possible that we could fix them by setting them with a X_loc = Parameter(torch.Tensor(0.1), requires_grad=True), for example, correct?

If everything I said above is correct then I think I finally understand how everything works together. Thank you again.

Best,
J. Emmanuel Johnson

1 Like

Hi @jejjohnson, I think that you get everything right. Thanks for your detailed clarifications! For your questions,

It is possible that we could fix them by setting them with a X_loc = Parameter(torch.Tensor(0.1), requires_grad=False)

To fix them, I use gplvm.X_loc.requires_grad_(False) but it is equivalent to what you did. :slight_smile: For example, in this benchmark test, I fixed inducing points from learning.

Is there a way to unfix this and let these parameters be learned?

Sure, you can do it but you need a bit more effort (not much though):

class LearnedPriorGP(gp.parameterized.Parameterized):
    def __init__(self, gplvm):
        self.gplvm = gplvm
        self.prior_loc = nn.Parameter(...)
        self.prior_scale = nn.Parameter(...)
        self.set_constraints("prior_scale", constraints.positive)

    def model(self):
        self.mode = "model"
        X = pyro.sample("X", dist.Normal(self.prior_loc, self.prior_scale))
        self.gplvm.set_data(X, y)
        self.gplvm.model()

    def guide(self):
        gplvm.guide()

and use this class instead of gplvm for inference.

Hope that it helps! The design pattern I have in mind when making gp module is to make it modular (like PyTorch nn.Module) and flexible, so it is easy to combine parts together as a probabilistic model (rather focusing in analytic derivations as in other frameworks). Please let me know if something does not work. :slight_smile:

1 Like

Hey @fehiepsi,

Thank you for the pseudocode and confirmation of clarification. I believe I have plenty to continue conducting my experiments regarding uncertain inputs for GPLVMs.

Once again, thank you for the replies and thank you for all your work on the contrib library. I appreciate it even more with every additional element of understanding!

Thanks!
Emmanuel

1 Like