Hi All,

first of all thank you for the amazing probabilistic programming language, it is of great help in my research! I have a doubt regarding how variational inference is executed in `VariationalGP`

models. In particular, I am not sure I completely understand the use of the sampling statements in the `model`

definition here below:

```
if self.whiten:
identity = eye_like(self.X, N)
pyro.sample("f",
dist.MultivariateNormal(zero_loc, scale_tril=identity)
.to_event(zero_loc.dim() - 1))
f_scale_tril = Lff.matmul(self.f_scale_tril)
f_loc = Lff.matmul(self.f_loc.unsqueeze(-1)).squeeze(-1)
else:
pyro.sample("f",
dist.MultivariateNormal(zero_loc, scale_tril=Lff)
.to_event(zero_loc.dim() - 1))
f_scale_tril = self.f_scale_tril
f_loc = self.f_loc
f_loc = f_loc + self.mean_function(self.X)
f_var = f_scale_tril.pow(2).sum(dim=-1)
if self.y is None:
return f_loc, f_var
else:
return self.likelihood(f_loc, f_var, self.y)
```

Why are we not using the samples from the latent variable **f** in the definition of the likelihood (in later stages of `model`

), but rather leave the task of defining the observed variable **y** entirely to the trainable parameters `f_loc`

and `f_scale_tril`

? Specifically, in following a *generative* view of Gaussian Processes it would seem reasonable to use a sampled latent variable, e.g. `fs = pyro.sample("f",...)`

, in the likelihood of the model *N( y|*

`fs`

*, sigma)*.

Also, I was wondering whether this could be related to the fact of using the following training scheme (as for the Gaussian Process Introduction on the documentation):

```
optimizer = torch.optim.Adam(gp.parameters(), lr=0.005)
loss_fn = pyro.infer.Trace_ELBO().differentiable_loss
losses = []
num_steps = 2500 if not smoke_test else 2
for i in range(num_steps):
optimizer.zero_grad()
loss = loss_fn(gp.model, gp.guide)
loss.backward()
optimizer.step()
losses.append(loss.item())
```

opposed to a more “pyro-classical” SVI training procedure as follows:

```
optimizer = pyro.optim.Adam({"lr": 0.01})
svi = SVI(gp.model, gp.guide, optimizer, loss=Trace_ELBO())
for i in range(1000):
svi.step()
```

In the first, if I understand correctly, we are optimizing the model parameters `model.parameters()`

, which contain also the guide parameters `f_loc`

and `f_scale_tril`

(not explicitly `pyro.param`

s in the `guide`

but rather trainable torch `Parameters`

used in both `model`

and `guide`

). Does this mean we are not explicitly using a `guide`

from which we can sample from through, for example, `svi.run()`

+ `EmpiricalMarginal`

(which we could do if we used the second training scheme)?

My idea is that the second training scheme would somehow require the use of latent samples (i.e. `fs = pyro.sample("f",...)`

) in defining the observations **y** to allow for a feasible posterior approximation through the `guide`

.

Hope I managed to be sufficiently clear and thank you very much in advance for your help!