I’m having trouble figuring out either how to set appropriate constraints, or how to set proper guides for constrained parameters.
Consider a simple regression where the location and scale parameters are have Normal and HalfNormal (or HalfCauchy) priors. I expect that the posterior of the scale parameter will follow a (truncated at 0) normal distribution. What I find is that the guide nevertheless finds negative draws.
Here is a minimally reproducible example:
def model(y):
mu = pyro.sample("mu", dist.Normal(torch.zeros(1), torch.ones(1)))
sigma = pyro.sample("sigma", dist.HalfNormal(torch.ones(1)))
with pyro.plate("obs", y.shape[0]):
obs = pyro.sample("y", dist.Normal(mu, sigma), obs=y)
def guide(y):
mu_loc = pyro.param("mu_loc", torch.ones(1))
mu_scale = pyro.param("mu_scale", torch.ones(1), constraint=constraints.positive)
assert all(mu_scale > 0)
mu = pyro.sample("mu", dist.Normal(mu_loc, mu_scale))
sigma_loc = pyro.param("sigma_loc", torch.ones(1), constraint=constraints.positive)
sigma_scale = pyro.param("sigma_scale", torch.ones(1), constraint=constraints.positive)
assert all(sigma_loc > 0)
assert all(sigma_scale > 0)
sigma = pyro.sample("sigma", dist.Normal(sigma_loc, sigma_scale))
assert all(sigma > 0)
def main(args):
data = torch.randn(100)
optim = Adam({"lr":0.03})
svi = SVI(model, guide, optim, loss=Trace_ELBO(max_plate_nesting=1), num_samples=1000)
for j in range(500):
loss = svi.step(data)
if j % 10 == 0:
print("[iteration %04d] loss: %.4f" % (j + 1, loss / len(data)))
posterior = svi.run(data)
You may have to run it a couple of times, but it will eventually fail on the sigma > 0
assertion.
What I expected to happen, is that the sample statement in the guide would confine itself to the support on he matching sample statement in the model.
For a simple model like this, the only effect is an incorrect posterior.
The problem becomes significant, however, in multi-level, hierarchical models where some other latent is drawn from a distribution whose scale parameter in sigma
.
This seems like an issue that should come up pretty often in hierarchical models. What’s the correct approach? Giving sigma
a HalfCauchy
or HalfNormal
guide, won’t match the expected distribution of sigma
. Calling abs()
or something like that would, I expect, distort the posterior geometry and mess-up the gradients.