Why do pyro.sample statements in guide but not in model break inference?

samlaf · September 8, 2020, 8:58pm

What tutorial are you running? SVI Part I: An Introduction to Stochastic Variational Inference in Pyro — Pyro Tutorials 1.8.4 documentation
What version of Pyro are you using? 1.4.0

Going through the SVI tutorial, I wanted to turn the guide into a posterior sampler, so added sampling statements there. I realized that by adding a random pyro.sample statement in the guide can completely break inference (it doesn’t converge anymore).

def guide(data):
    alpha_q = pyro.param("alpha_q", torch.tensor(15.0),
                         constraint=constraints.positive)
    beta_q = pyro.param("beta_q", torch.tensor(15.0),
                        constraint=constraints.positive)
    pyro.sample("latent_fairness", dist.Beta(alpha_q, beta_q))
   # Adding this statement breaks inference
   pyro.sample('foo', dist.Bernoulli(0.5))

Why is this?

eb8680_2 · September 9, 2020, 5:56pm

You can think of nuisance variables like foo as equivalent to adding independent noise to your elbo:

loss = compute_elbo(model, guide, data) + torch.rand().log()

Depending on the type of random variable and the gradient estimator used by the ELBO implementation, this may or may not affect gradients of the loss. At any rate, as you can imagine, there’s no good reason to include nuisance variables like this.

What were you trying to accomplish? If you can provide additional context about your goal we can probably point you in the right direction.

samlaf · September 10, 2020, 5:32am

Thank you!!
That still doesn’t make a whole lot of sense to me, but I’m sure as I continue reading through the tutorials I will understand better what you mean.

I was just trying to have my guide also return samples (the same way the model does), so that I could use it for posterior sampling. I’ve found 3 “good” ways to do this so far:

use torch.sample instead of pyro.sample
make a separate posterior_sampling function which uses the pyro.param parameters (as opposed to the prior parameters used in the model)
Apparently pyro.infer.predictive is made exactly for this, but I still haven’t figured out how it works.

eb8680_2 · September 11, 2020, 1:55am

Can you clarify what you mean by posterior sampling, and perhaps provide some code for your model and guide? Do you mean sampling from the posterior distribution over latent variables z ~ p(z | x) or the posterior predictive distribution over observed variables x’ ~ p(x’ | x) = sum_z p(x’ | z) * p(z | x)?

samlaf · September 11, 2020, 3:29am

Sorry for not being clear… I’m getting used to this language.
I meant posterior prediction distribution samples. Here’s the code I was trying to run at first.

def model(data):
    alpha0 = torch.tensor(10.0)
    beta0 = torch.tensor(10.0)
    f = pyro.sample("latent_fairness", dist.Beta(alpha0, beta0))
    samples = torch.zeros(len(data))
    for i in range(len(data)):
        samples[i] = pyro.sample("obs_{}".format(i), dist.Bernoulli(f), obs=data[i])
    return samples

def guide(data):
    alpha_q = pyro.param("alpha_q", torch.tensor(15.0),
                         constraint=constraints.positive)
    beta_q = pyro.param("beta_q", torch.tensor(15.0),
                        constraint=constraints.positive)
    f = pyro.sample("latent_fairness", dist.Beta(alpha_q, beta_q))
    samples = torch.zeros(len(data))
    for i in range(len(data)):
        samples[i] = pyro.sample("obs_{}".format(i), dist.Bernoulli(f))
    return samples

My original idea was: “if model returns samples when I call it, then guide should return posterior samples too”. But now I understand that guide is really only meant to be an inference on the hidden variables! So the “correct” way would be to return f and then use that elsewhere to generate posterior predictive distribution samples.