Setting constraints and guides on constrained distributions


I’m having trouble figuring out either how to set appropriate constraints, or how to set proper guides for constrained parameters.

Consider a simple regression where the location and scale parameters are have Normal and HalfNormal (or HalfCauchy) priors. I expect that the posterior of the scale parameter will follow a (truncated at 0) normal distribution. What I find is that the guide nevertheless finds negative draws.

Here is a minimally reproducible example:

def model(y):
    mu = pyro.sample("mu", dist.Normal(torch.zeros(1), torch.ones(1)))
    sigma = pyro.sample("sigma", dist.HalfNormal(torch.ones(1)))
    with pyro.plate("obs", y.shape[0]):
        obs = pyro.sample("y", dist.Normal(mu, sigma), obs=y)

def guide(y):
    mu_loc = pyro.param("mu_loc", torch.ones(1))
    mu_scale = pyro.param("mu_scale", torch.ones(1), constraint=constraints.positive)
    assert all(mu_scale > 0)
    mu = pyro.sample("mu", dist.Normal(mu_loc, mu_scale))

    sigma_loc = pyro.param("sigma_loc", torch.ones(1), constraint=constraints.positive)
    sigma_scale = pyro.param("sigma_scale", torch.ones(1), constraint=constraints.positive)
    assert all(sigma_loc > 0)
    assert all(sigma_scale > 0)
    sigma = pyro.sample("sigma", dist.Normal(sigma_loc, sigma_scale))
    assert all(sigma > 0)

def main(args):
    data = torch.randn(100)

    optim = Adam({"lr":0.03})
    svi = SVI(model, guide, optim, loss=Trace_ELBO(max_plate_nesting=1), num_samples=1000)
    for j in range(500):
        loss = svi.step(data)
        if j % 10 == 0:
            print("[iteration %04d] loss: %.4f" % (j + 1, loss / len(data)))
    posterior =

You may have to run it a couple of times, but it will eventually fail on the sigma > 0 assertion.

What I expected to happen, is that the sample statement in the guide would confine itself to the support on he matching sample statement in the model.

For a simple model like this, the only effect is an incorrect posterior.

The problem becomes significant, however, in multi-level, hierarchical models where some other latent is drawn from a distribution whose scale parameter in sigma.

This seems like an issue that should come up pretty often in hierarchical models. What’s the correct approach? Giving sigma a HalfCauchy or HalfNormal guide, won’t match the expected distribution of sigma. Calling abs() or something like that would, I expect, distort the posterior geometry and mess-up the gradients.


it is failing because the sigma sample in the guide is outside of the support of the sigma distribution in the model. depending on your model, as you conjectured, you need to either use a distribution with a positive support eg HalfCauchy or transform it via eg softplus().


Then how does one go about specifying the expectation of a (truncated) normally distributed posterior over a constrained value?

What am I missing that this limitation isn’t a pretty big restriction on the ability to use Pyro’s SVI to fit a hierarchical model?

If ones cares about inferring a proper (truncated) normal posterior over sigma for some reason (like, say, if sigma was actually a vector of volatilities), how would one go about doing that?