Why do we sample a latent variable in guide and model separately?

dilara · April 11, 2022, 7:25am

This might be a trivial thing, but could you please explain why a latent variable should be sampled in the guide and model separately? For example, can’t we use a latent variable we sampled in the guide later in the model?

Here is an example for making my question more concrete. The following code is inspired from SVI Part I: An Introduction to Stochastic Variational Inference in Pyro — Pyro Tutorials 1.8.4 documentation.

def model(data):
    alpha0 = torch.tensor(15.0)
    beta0 = torch.tensor(15.0)
    f = pyro.sample("latent_fairness", dist.Beta(alpha0, beta0))
    for i in range(len(data)):
        pyro.sample("obs_{}".format(i), dist.Bernoulli(f), obs=data[i])

def guide(data):
    alpha_q = pyro.param("alpha_q", torch.tensor(15.0),
                         constraint=constraints.positive)
    beta_q = pyro.param("beta_q", torch.tensor(15.0),
                        constraint=constraints.positive)
    pyro.sample("latent_fairness", dist.Beta(alpha_q, beta_q))

What would be the difference if this model/guide pair is implemented the following way instead?

def model(data):
    for i in range(len(data)):
        pyro.sample("obs_{}".format(i), dist.Bernoulli(latent_fairness), obs=data[i])

def guide(data):
    alpha_q = pyro.param("alpha_q", torch.tensor(15.0),
                         constraint=constraints.positive)
    beta_q = pyro.param("beta_q", torch.tensor(15.0),
                        constraint=constraints.positive)
    latent_fairness = pyro.sample("latent_fairness", dist.Beta(alpha_q, beta_q))

martinjankowiak · April 12, 2022, 1:13pm

pyro models/guides are python functions. i don’t really understand your question but

def model(data):
    for i in range(len(data)):
        pyro.sample("obs_{}".format(i), dist.Bernoulli(latent_fairness), obs=data[i])

is not a valid python function since latent_fairness is not defined (e.g. via a sample statement)

dilara · April 12, 2022, 1:40pm

Thank you for your reply and sorry for the confusing code snippet. Let me rephrase my question and then clarify that part.

In the Pyro tutorials, we see that there are two pyro.sample statements for each latent variable, one in the model and one in the guide. The latent variable in this example is latent_fairness. I was wondering what would change if we sampled a latent variable only once, e.g. in the guide, and used that sample in the model instead of sampling in the model separately?

latent_fairness (the variable) is declared in the function scope in my second example snippet. My intention was to declare it in global scope, assign a value in the guide, and use that assigned value in the model. So, the second snippet should have looked like this (I only added a line to the beginning).

latent_fairness = None
def model(data):
    for i in range(len(data)):
        pyro.sample("obs_{}".format(i), dist.Bernoulli(latent_fairness), obs=data[i])

def guide(data):
    alpha_q = pyro.param("alpha_q", torch.tensor(15.0),
                         constraint=constraints.positive)
    beta_q = pyro.param("beta_q", torch.tensor(15.0),
                        constraint=constraints.positive)
    latent_fairness = pyro.sample("latent_fairness", dist.Beta(alpha_q, beta_q))

martinjankowiak · April 12, 2022, 2:33pm

that wouldn’t work either. pyro “handles” the scope for you by running the guide and then using the sampled values to “replay” the model. both the model and guide must have matching sample statements. the model statements define the model of interest. the guide sample statements define the parametric posterior approximation that is being fit to the posterior defined by the model. this fitting is done using the ELBO which depends on both the model and guide. please refer to the intro and the various svi tutorials for more details

MorSauron · February 19, 2023, 3:22pm

Does that means when we define the model function, we should know something about the posterior distribution and set the parameters in a proper way. If the model defined in the model function deviates from the true posterior distribution a lot, we can not get the proper distribution with guide function either? @martinjankowiak Thanks!

martinjankowiak · February 21, 2023, 2:45am

vanilla variational inference relies on choosing a parametric family of variational distributions a.k.a. guides. it also involves a potentially hard optimization problem. so if your guide family does not come anywhere close to the true posterior or your guide family is poorly initialized you will likely get bad results.

MorSauron · February 21, 2023, 3:08am

The parameters defined in guide function are changing during the optimization, however the parameters defined in the model function seem to be fixed, according to my understanding, they will never change during the optimization, setting different initial values for parameters in model function will influence the results a lot, it that correct? Thanks! ^^

martinjankowiak · February 23, 2023, 4:00pm

parameters in the model are maximized via elbo maximization which is similar in spirit to type ii maximum likelihood. for further discussion see the svi tutorials