Hi,

I am currently working on a system that uses DKL in an active learning setup (Variance for Active Learning). The core of the system is based on the DKL example that is given in the examples section (https://pyro.ai/examples/dkl.html). So far everything seems to work out except that sometimes during model training I get this exception.

`RuntimeError: cholesky_cpu: U(65,65) is zero, singular U.`

I have looked around a bit to see how to deal with this problem and have come across the following solutions:

- Increate the
`noise`

/`jitter`

or use`torch.float64`

instead of`torch.float32`

tensors (Cholesky decomposition during GPRegression model optimization) - Set the
`lengthscale`

prior to be strictly positive (https://github.com/pyro-ppl/pyro/issues/1863#issuecomment-491438521)

So far I have tried to increase the `jitter`

which in some cases leads to the model being able to complete the training, in other cases I get the same error message only later. Next, I tried to use torch.float64 tensors instead of `torch.float32`

, which ultimately failed because I was unable to get the `gpmodule`

(see below) to work with `torch.float64`

tensors.

Maybe someone from the pyro team could help out here?

After that I looked at the solution where the prior for the `lengthscale`

is limited to strictly positive values. Although this approach seems to me to be the most effective, as it would probably guarantee that the error mentioned above would not occur again, I lack the necessary knowledge to implement this solution. In addition, it may also be that this solution does not even apply to my case.

Therefore I posted below the code I am currently using in my project. I can imagine that solving this problem is not trivial but maybe there is a good way to deal with it. Perhaps one can adjust the `warp_core`

so that it provides “better” values for the Gauss Process? I can imagine that other users experience similar problems so that we might be able to develop some kind of best practice approach.

```
# Define neural net which is used as warp core.
class WarpCore(nn.Module):
def __init__(self, dims):
super(WarpCore, self).__init__()
self.fc1 = nn.Linear(dims, 100)
self.fc2 = nn.Linear(100, 50)
self.fc3 = nn.Linear(50, 50)
self.fc4 = nn.Linear(50, 2)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
x = torch.relu(self.fc3(x))
x = self.fc4(x)
return x
# Define data loaders
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=len(test_dataset), shuffle=False)
# Get inducing points
batches = []
for i, (data, _) in enumerate(train_loader):
batches.append(data)
if i >= ((number_inducing - 1) // 64):
break
inducing_points = torch.cat(batches)[:number_inducing].clone()
# Define loss function
elbo = infer.TraceMeanField_ELBO()
loss_fn = elbo.differentiable_loss
# Define likelihood
likelihood = gp.likelihoods.Binary()
# Create deep kernel
warp_core = WarpCore(100)
kernel_fn = gp.kernels.RBF(input_dim=2, lengthscale=torch.ones(2))
deep_kernel = gp.kernels.Warping(kernel_fn, iwarping_fn=warp_core)
# Set up VariationalSparseGP
gpmodule = gp.models.VariationalSparseGP(X=inducing_points, y=None, kernel=deep_kernel,
Xu=inducing_points, likelihood=likelihood,
latent_shape=torch.Size([]),
num_data=len(train_dataset),
whiten=True, jitter=1e-2)
# Set up optimizer
optimizer_params = {"lr": learning_rate}
optimizer = torch.optim.Adam(gpmodule.parameters(), **optimizer_params)
# Define training loop
epochs = 800
for epoch in range(1, epochs + 1):
epoch_loss = torch.Tensor()
for batch_idx, (data, target) in enumerate(train_loader):
if cuda:
data, target = data.cuda(), target.cuda()
target = target.float()
gpmodule.set_data(data, target)
optimizer.zero_grad()
loss = loss_fn(gpmodule.model, gpmodule.guide)
loss.backward()
optimizer.step()
epoch_loss = torch.cat([epoch_loss, torch.Tensor([loss.item()])])
print(epoch_loss.mean())
```

Hope someone can help out with this …