RuntimeError: cholesky_cpu: U(1,1) is zero, singular U - AutoLaplaceApproximation

This is related to my earlier issue which I was able to resolve thanks to @martinjankowiak. However, I am again facing the same issue in a different problem setting. This is again from Statistical Rethinking Chapter 5:

milk_df = pd.read_csv("https://raw.githubusercontent.com/rmcelreath/rethinking/master/data/milk.csv", sep=";")
dcc = milk_df.dropna()


def model55(neop):
  a = pyro.sample("a", dist.Normal(tensor(0.), tensor(100.)))
  bn = pyro.sample("bn", dist.Normal(tensor(0.), tensor(1.)))  
  sigma = pyro.sample("sigma", dist.Uniform(tensor(0.), tensor(1.)))
  mu = a + bn * neop
  kcal_per_g = pyro.sample("kcal_per_g", dist.Normal(mu, sigma))
  return kcal_per_g

conditioned55 = pyro.condition(model55, data={"kcal_per_g": tensor(dcc['kcal.per.g'], dtype=torch.float)})
guide55 = pyro.infer.autoguide.AutoLaplaceApproximation(conditioned55)
pyro.clear_param_store()

svi = pyro.infer.SVI(
    model=conditioned55,
    guide=guide55,
    optim=pyro.optim.Adam({"lr": 0.005}),
    loss=pyro.infer.Trace_ELBO(),
)
num_steps = 5000
losses = [svi.step(tensor(dcc['neocortex.perc'], dtype=torch.float)) for t in range(num_steps)]
plt.plot(losses)

laplace_guide55 = guide55.laplace_approximation(tensor(dcc['neocortex.perc'], dtype=torch.float))
pred55 = pyro.infer.Predictive(laplace_guide55, num_samples=1000)  
precis55 = pred55.get_samples()

resulting in

---> 27 laplace_guide55 = guide55.laplace_approximation(tensor(dcc['neocortex.perc'], dtype=torch.float))
     28 pred55 = pyro.infer.Predictive(laplace_guide55, num_samples=1000)
     29 precis55 = pred55.get_samples()

/usr/local/lib/python3.6/dist-packages/pyro/infer/autoguide/guides.py in laplace_approximation(self, *args, **kwargs)
   1033         cov = H.inverse()
   1034         loc = self.loc
-> 1035         scale_tril = cov.cholesky()
   1036 
   1037         gaussian_guide = AutoMultivariateNormal(self.model)

RuntimeError: cholesky_cpu: U(1,1) is zero, singular U.

A Google colab version of the above code is here. Any pointers on what am I missing this time?

have you tried lowering the learning rate and doing more gradient steps? there’s no guarantee in general that you’re going to converge to a point in the energy surface where you can compute the laplacian (e.g. you might hit a saddlepoint). unless you’re particularly wedded to the laplace approximation, i’d recommend using vanilla SVI ELBO training with a e.g. AutoDiagNormal or custom guide

Thanks for the suggestion @martinjankowiak! Part of the reason to stick to laplacian approximation is because the book uses it implicitly (through its original Stan based code). Reducing the learning rate seem to help. See first example here.

I also tried moving to AutoDiagNormal and no longer get a decompostion error, however my results are far from optimal.

I will continue experimenting at my end, but if you have more insights, I will be grateful. Thanks!