Hello,

I’m using pyro.contrib.gp to train my data but I have encountered a weird issue.

Say we define 2 GPR models as follows:

```
kernel_init = gp.kernels.RBF(input_dim= dimension, variance=torch.tensor(1.),
lengthscale=length_scale_init)
gpr_init = gp.models.GPRegression(train_x, train_y, kernel_init, noise=torch.tensor(0.), jitter = jitter)
kernel = gp.kernels.RBF(input_dim= dimension, variance=torch.tensor(1.),
lengthscale=length_scale_init)
gpr_opt = gp.models.GPRegression(train_x, train_y, kernel, noise=torch.tensor(0.), jitter = jitter)
```

Then after one optimizes the lengthscale parameters of the kernel by

```
optimizer = torch.optim.Adam([{'params': gpr_opt.kernel.lengthscale_unconstrained}], lr=5e-4)
for i in range(num_steps):
# Zero gradients from previous iteration
optimizer.zero_grad()
# Calc loss and backprop gradients
loss = loss_function(params)
loss.backward()
# Update step
optimizer.step()
losses.append(loss.item())
```

Finally, you want to compute the values of the two models at some test_x points:

```
with torch.no_grad():
init_value = gpr_init(test_x, full_cov=False, noiseless=True)
opt_value = gpr_opt(test_x, full_cov=False, noiseless=True)
```

And weirdly, one gets the same value `opt_value = init_value`

but each GPR model has a kernel with different length scales. How come? It seems to work if I remove the `init_value=...`

in the last step after `with torch.no_grad():`

.