We say this is a “diagonal normal” guide because the learned posterior is independent across the different variables a, bA, bR, bAR, sigma, that is their posterior joint covariance is a diagonal matrix. Alternatively we could have encoded a multivariate normal guide, either automatically using AutoMultivariateNormal or manually encoding a covariance structure, e.g. here’s a simple non-diagonal guide over two variables a and b:
def guide():
loc = pyro.param("loc", torch.zeros(2))
scale_tril = pyro.param(
"scale_tril", torch.eye(2), constraint=constraints.lower_cholesky
)
a = pyro.sample("a", dist.Normal(loc[0], scale_tril[0,0]))
b = pyro.sample(
"b", dist.Normal(loc[1] + a * scale_tril[1,0], scale_tril[1,1])
)
whereas the diagonal guide had no posterior dependencies between random variables, this guide has b depending on a via a * scale_tril[1,0].
More generally guides can have non-Gaussian posteriors with arbitrary dependency.
this is a great explanation. and thanks for sharing the non-diagonal code as well. Just to clarify something, it would have been non-diagonal even if the term for b would have been the following:
b = pyro.sample(
"b", dist.Normal(loc[1] + a , scale_tril[1,1])
)
That’s correct, that would also have been non-diagonal, just not learnably multivariate. Actually it’s pretty common to use such tricks in guides or even reparametrizing models, e.g. in non-centering transforms where a local variable is known to have prior mean equal to some global variable.