Setting constraints and guides on constrained distributions

amos · February 19, 2019, 12:40am

I’m having trouble figuring out either how to set appropriate constraints, or how to set proper guides for constrained parameters.

Consider a simple regression where the location and scale parameters are have Normal and HalfNormal (or HalfCauchy) priors. I expect that the posterior of the scale parameter will follow a (truncated at 0) normal distribution. What I find is that the guide nevertheless finds negative draws.

Here is a minimally reproducible example:

def model(y):
    mu = pyro.sample("mu", dist.Normal(torch.zeros(1), torch.ones(1)))
    sigma = pyro.sample("sigma", dist.HalfNormal(torch.ones(1)))
    with pyro.plate("obs", y.shape[0]):
        obs = pyro.sample("y", dist.Normal(mu, sigma), obs=y)

def guide(y):
    mu_loc = pyro.param("mu_loc", torch.ones(1))
    mu_scale = pyro.param("mu_scale", torch.ones(1), constraint=constraints.positive)
    assert all(mu_scale > 0)
    mu = pyro.sample("mu", dist.Normal(mu_loc, mu_scale))

    sigma_loc = pyro.param("sigma_loc", torch.ones(1), constraint=constraints.positive)
    sigma_scale = pyro.param("sigma_scale", torch.ones(1), constraint=constraints.positive)
    assert all(sigma_loc > 0)
    assert all(sigma_scale > 0)
    sigma = pyro.sample("sigma", dist.Normal(sigma_loc, sigma_scale))
    assert all(sigma > 0)

def main(args):
    data = torch.randn(100)

    optim = Adam({"lr":0.03})
    svi = SVI(model, guide, optim, loss=Trace_ELBO(max_plate_nesting=1), num_samples=1000)
    for j in range(500):
        loss = svi.step(data)
        if j % 10 == 0:
            print("[iteration %04d] loss: %.4f" % (j + 1, loss / len(data)))
    posterior = svi.run(data)

You may have to run it a couple of times, but it will eventually fail on the sigma > 0 assertion.

What I expected to happen, is that the sample statement in the guide would confine itself to the support on he matching sample statement in the model.

For a simple model like this, the only effect is an incorrect posterior.

The problem becomes significant, however, in multi-level, hierarchical models where some other latent is drawn from a distribution whose scale parameter in sigma.

This seems like an issue that should come up pretty often in hierarchical models. What’s the correct approach? Giving sigma a HalfCauchy or HalfNormal guide, won’t match the expected distribution of sigma. Calling abs() or something like that would, I expect, distort the posterior geometry and mess-up the gradients.

jpchen · February 19, 2019, 5:50am

it is failing because the sigma sample in the guide is outside of the support of the sigma distribution in the model. depending on your model, as you conjectured, you need to either use a distribution with a positive support eg HalfCauchy or transform it via eg softplus().

amos · February 19, 2019, 6:38am

Then how does one go about specifying the expectation of a (truncated) normally distributed posterior over a constrained value?

What am I missing that this limitation isn’t a pretty big restriction on the ability to use Pyro’s SVI to fit a hierarchical model?

If ones cares about inferring a proper (truncated) normal posterior over sigma for some reason (like, say, if sigma was actually a vector of volatilities), how would one go about doing that?

mathDR · May 27, 2020, 5:42pm

Was this ever effectively solved? How are sample constraints implmented? Like, if we wanted to do a Normal distribution with some loc and scale, but have values only in [2.0,infinity). I would assume:

pyro.sample("test_distribution",
             Normal(torch.tensor(4.0),
                    torch.tensor(1.0),
                    constraints.greater_than(2.0)
                   )
            )

would effectively constrain it. Is this not the case?

grvc · August 9, 2021, 4:45am

As of this writing, it would appear not.
Your example will only rarely produce values below 2.0 simply because of the distribution you’re using, but run it a few times (or increase the value of the scale parameter) and you should almost certainly get a result that violates your ‘apparent’ constraint.

Bumping this because I too am having difficulty finding documentation about constraining pyro.sample( ) statements. There are many examples of constraints when defining parameters, but it’s unclear whether this functionality applies to sample statements (e.g., in models that will be inferred using MCMC)

martinjankowiak · August 15, 2021, 1:51pm

@grvc there is no general purpose machinery for constraining distributions in pyro. one reason for this is that while doing so is relatively easy for e.g. HMC (at least conceptually) it is less straightforward for SVI since the latter requires e.g. the ability to sample from the constrained distribution.

your best bet in general is to define your own custom distribution.

in some cases you can also obtain the desired distribution with a transform e.g. FoldedDistribution

for HMC you can also “simulate” a custom distribution by introducing a sample statement with a flat prior and then adding an appropriate factor statement to generate the appropriate log_prob

daknowles · August 8, 2023, 9:43pm

Presumably it’s also fine to make a custom guide where the support is a subset of that in the model? The ELBO calculation will be off by some normalization constant but you rarely care about that.

martinjankowiak · August 8, 2023, 10:27pm

well it depends on the details.

if you’re learning parameters of this guide distribution and those parameters enter into the normalizing constant then dropping the normalizing constant will leads to biased gradients.

it also depends on how you achieve the custom support. if you e.g. clamp the samples coming out of a normal distribution to achieve a truncated normal then the result will be a distribution with broken reparameterized (aka pathwise) gradients since clamp is not differentiable. moreover the resulting distribution would in effect have delta function spikes at either end and so wouldn’t be a standard truncation. etc. etc.

ultimately the log_prob and rsample method (or alternatively sample for non-reparameterized distributions) must be self-consistent if you want correct results.

daknowles · August 8, 2023, 10:57pm

I mean something like this, e.g. for the case where you want to constrain x to be in [0,1].

def model(): 
    x = pyro.sample("x", dist.Normal(1.,1.))
    
def guide():
    
    logit_x_loc = pyro.param("logit_x_loc", torch.tensor(0.))
    logit_x_scale = pyro.param("logit_x_scale", torch.tensor(1.), constraint = constraints.positive)
    
    base_dist = dist.Normal(logit_x_loc, logit_x_scale)
    x = pyro.sample("x", dist.TransformedDistribution(base_dist, SigmoidTransform()))

I don’t think there are missing normalization constants since TransformedDistribution handles that for the guide. The log_prob of the model doesn’t need to be normalized.

martinjankowiak · August 8, 2023, 11:50pm

yes that’s indeed fine

daknowles · August 9, 2023, 12:59am

I think it would even be fine to add a parametrized AffineTransform (and learn those parameters) right? I don’t want that for my current application but cool to have as an option.

martinjankowiak · August 9, 2023, 1:18am

that’s right. that’s what essentially happens with normalizing flows, e.g. in the deep markov model tutorial