Gaussian Process model with different lengthscale

Hello,
I’m using pyro.contrib.gp to train my data but I have encountered a weird issue.
Say we define 2 GPR models as follows:

kernel_init = gp.kernels.RBF(input_dim= dimension, variance=torch.tensor(1.),
                        lengthscale=length_scale_init)
gpr_init = gp.models.GPRegression(train_x, train_y, kernel_init, noise=torch.tensor(0.), jitter = jitter)
kernel = gp.kernels.RBF(input_dim= dimension, variance=torch.tensor(1.),
                        lengthscale=length_scale_init)
gpr_opt = gp.models.GPRegression(train_x, train_y, kernel, noise=torch.tensor(0.), jitter = jitter)

Then after one optimizes the lengthscale parameters of the kernel by

optimizer = torch.optim.Adam([{'params': gpr_opt.kernel.lengthscale_unconstrained}], lr=5e-4)
for i in range(num_steps):
    # Zero gradients from previous iteration
    optimizer.zero_grad()
    # Calc loss and backprop gradients
    loss = loss_function(params)
    loss.backward()
    # Update step
    optimizer.step()
    losses.append(loss.item()) 

Finally, you want to compute the values of the two models at some test_x points:

with torch.no_grad():
         init_value = gpr_init(test_x, full_cov=False, noiseless=True)
         opt_value = gpr_opt(test_x, full_cov=False, noiseless=True)

And weirdly, one gets the same value opt_value = init_value but each GPR model has a kernel with different length scales. How come? It seems to work if I remove the init_value=... in the last step after with torch.no_grad():.

I guess the reason is Pyro modules uses the same param stores: those lengthscale parameters have the same names inside the param stores so their values are the same.

Thank you for your answer. What would be the solution then?

I think you can use pyro.get_param_store() to save/load parameters from disk. After training a GP, you can save its parameters, then call pyro.clear_param_store() and work with the second GP.

Thanks. I will have a look at it.