Why do pyro.sample statements in guide but not in model break inference?

Going through the SVI tutorial, I wanted to turn the guide into a posterior sampler, so added sampling statements there. I realized that by adding a random pyro.sample statement in the guide can completely break inference (it doesn’t converge anymore).

def guide(data):
    alpha_q = pyro.param("alpha_q", torch.tensor(15.0),
                         constraint=constraints.positive)
    beta_q = pyro.param("beta_q", torch.tensor(15.0),
                        constraint=constraints.positive)
    pyro.sample("latent_fairness", dist.Beta(alpha_q, beta_q))
   # Adding this statement breaks inference
   pyro.sample('foo', dist.Bernoulli(0.5))

Why is this?

You can think of nuisance variables like foo as equivalent to adding independent noise to your elbo:

loss = compute_elbo(model, guide, data) + torch.rand().log()

Depending on the type of random variable and the gradient estimator used by the ELBO implementation, this may or may not affect gradients of the loss. At any rate, as you can imagine, there’s no good reason to include nuisance variables like this.

What were you trying to accomplish? If you can provide additional context about your goal we can probably point you in the right direction.

Thank you!!
That still doesn’t make a whole lot of sense to me, but I’m sure as I continue reading through the tutorials I will understand better what you mean.

I was just trying to have my guide also return samples (the same way the model does), so that I could use it for posterior sampling. I’ve found 3 “good” ways to do this so far:

  1. use torch.sample instead of pyro.sample
  2. make a separate posterior_sampling function which uses the pyro.param parameters (as opposed to the prior parameters used in the model)
  3. Apparently pyro.infer.predictive is made exactly for this, but I still haven’t figured out how it works.

Can you clarify what you mean by posterior sampling, and perhaps provide some code for your model and guide? Do you mean sampling from the posterior distribution over latent variables z ~ p(z | x) or the posterior predictive distribution over observed variables x’ ~ p(x’ | x) = sum_z p(x’ | z) * p(z | x)?

Sorry for not being clear… I’m getting used to this language.
I meant posterior prediction distribution samples. Here’s the code I was trying to run at first.

def model(data):
    alpha0 = torch.tensor(10.0)
    beta0 = torch.tensor(10.0)
    f = pyro.sample("latent_fairness", dist.Beta(alpha0, beta0))
    samples = torch.zeros(len(data))
    for i in range(len(data)):
        samples[i] = pyro.sample("obs_{}".format(i), dist.Bernoulli(f), obs=data[i])
    return samples

def guide(data):
    alpha_q = pyro.param("alpha_q", torch.tensor(15.0),
                         constraint=constraints.positive)
    beta_q = pyro.param("beta_q", torch.tensor(15.0),
                        constraint=constraints.positive)
    f = pyro.sample("latent_fairness", dist.Beta(alpha_q, beta_q))
    samples = torch.zeros(len(data))
    for i in range(len(data)):
        samples[i] = pyro.sample("obs_{}".format(i), dist.Bernoulli(f))
    return samples

My original idea was: “if model returns samples when I call it, then guide should return posterior samples too”. But now I understand that guide is really only meant to be an inference on the hidden variables! So the “correct” way would be to return f and then use that elsewhere to generate posterior predictive distribution samples.