Bayesian regression predictions

  • What tutorial are you running? Bayesian regression (Part I)
  • What version of Pyro are you using? 0.3.3

Hi, I’ve been following this tutorial to implement a Bayesian nnet in Pyro, and I’m being able to follow it till the prediction step, where I get a bit confused about the sampling pipeline; in particular, my questions are:

Considering the model and the evaluation code below:

def model(x_data, y_data):
   # ... I'm commenting the prior definition out
    lifted_module = pyro.random_module("module", regression_model, priors)
    lifted_reg_model = lifted_module()
    with pyro.plate("map", len(x_data)):
        prediction_mean = lifted_reg_model(x_data).squeeze(-1)
        pyro.sample("obs", Normal(prediction_mean, scale), obs=y_data)
        return prediction_mean

def evaluate_model(svi, x_data, y_data):
    posterior = svi.run(x_data, y_data)
    post_pred = TracePredictive(wrapped_model, posterior, num_samples=1000).run(x_data, None)
    marginal = EmpiricalMarginal(post_pred, ['obs'])._get_samples_and_weights()[0].detach().cpu().numpy()
  1. Inside svi.run(x_data, y_data), Pyro is internally executing:
for i in range(self.num_samples):
    guide_trace = poutine.trace(self.guide).get_trace(*args, **kwargs)
    model_trace = poutine.trace(poutine.replay(self.model, trace=guide_trace)).get_trace(*args, **kwargs)
    yield model_trace, 1.0

I understand that poutine.trace(self.guide).get_trace(*args, **kwargs) is the way to sample from the trained guide (namely, the guide’s params), to be later used in the joint distribution (i.e. the model) via poutine.trace(poutine.replay(self.model, trace=guide_trace)).get_trace(*args, **kwargs).
a) Is what I’m saying correct?
b) In this last call, because I’m passing the x, y testing data to model(x_data, y_data), isn’t that gonna condition the likelihood to the y testing data (due to pyro.sample("obs", Normal(prediction_mean, scale), obs=y_data) ?

  1. After having called svi.run(...) I should have already got my posterior traces. Why do I need to call TracePredictive(...).run(x_data, None), which is internally calling resampled_trace = poutine.trace(poutine.replay(self.model, model_trace)).get_trace(*args, **kwargs) again ?

  2. Lastly, I see in the code that at several points there is a Delta distribution being instantiated with some values. Is this a trick used so that you can get these original values from a distribution object? That is, defining a distribution that has only 1 value, that can be later sampled from?

Thanks!

2 Likes

a) Is what I’m saying correct?

That’s right.

b) In this last call, because I’m passing the x, y testing data to model(x_data, y_data) , isn’t that gonna condition the likelihood to the y testing data (due to pyro.sample("obs", Normal(prediction_mean, scale), obs=y_data) ?

Yes. And this is the answer to your second question. svi.run will give you traces from the posterior, but it will condition observed sites to the initial data, so you need to resample the latent sites from these traces to generate predictions over new data.

  1. After having called svi.run(...) I should have already got my posterior traces. Why do I need to call TracePredictive(...).run(x_data, None) , which is internally calling resampled_trace = poutine.trace(poutine.replay(self.model, model_trace)).get_trace(*args, **kwargs) again ?

To get predictions, you still need to run the model forward (possibly on new data) and only resample the latent sites from the posterior distribution. TracePredictive is much more general (e.g. it deals with data subsampling) but you can always just use poutine.replay to generate predictions if you don’t need this general machinery.

  1. Lastly, I see in the code that at several points there is a Delta distribution being instantiated with some values. Is this a trick used so that you can get these original values from a distribution object?

I’m not sure which part of the code you are referring to, but we commonly use pyro.sample(dist.Delta(x, log_prob=some_custom_term) to inject some custom log density term to the trace, e.g. a jacobian adjustment term.

2 Likes

Thanks!!!

@neerajprad
From my opinion, I found that pyro currently put too much on how to train a prob model, but leave how to evaluate the model quite complex. By complex, I mean we have to read how svi.run is, how trace works, what is the traceposterior, what is trace predictive and so on to understand how pyro works in prediction.

Currently it still makes me quite confused about the subsample for tracepredictive.

Even the function name _get_samples_and_weights(), I think, by design, this function should be avoided to be used openly since it starts with the underscore.

Most of the documents are not about how we do predict in pyro. I wonder if we can write a single doc for the prediction part for a more general approach instead of only in the regression model?

I think current approaches expose too much of the details for the users. If I miss something, please help me, thanks a lot !

In general, we hope to learn a model, see the elbo loss, predict the value given the new inputs, see the learned values for the params, and see the approximated posterior distribution for the latent variables. But it seems that they are put in the sea of pyro…

2 Likes

From my opinion, I found that pyro currently put too much on how to train a prob model, but leave how to evaluate the model quite complex. By complex, I mean we have to read how svi.run is, how trace works, what is the traceposterior, what is trace predictive and so on to understand how pyro works in prediction.

I am in complete agreement with you. This is something that we are refactoring right now. For the next release, the interface for doing predictions with MCMC will be simplified (also, hopefully, for SVI too, we can do the same shortly thereafter). You can follow this issue - Remove dependency on TracePosterior and deprecate utilities in AbstractInfer · Issue #1930 · pyro-ppl/pyro · GitHub.

4 Likes

Thanks a lot !

@neerajprad

I want to share some opinions for predictions for Bayesian Regression. It seems to me that after we have the posterior probability distribution of all the parameters, we sample from it to get 800 samples of the paramters. Then, for each parameter, we input x_new and predict a y. Finally we use the mean of all y as our prediction. My question is that can we do Bayesian Prediction in Pyro? For example, I need to compute p(y|x) and marginalize over the posterior distribution of the parameters which are the ‘linear.weights’ here.

p(y|x) = ∫θp(y|x,θ)p(θ| D) dt

Maybe we can approximate it by

p(y|x) = (1/S)Σp(y|x,θs) where θs ~ p(θ|D)

How does pyro maximize p(y|x). I am afraid the predictions get in Pyro do not maximize p(y|x) for Bayesian prediction.

Maybe we can approximate it by…

As you point out, it will be easiest to just reuse your posterior samples and do a Monte Carlo estimate. You don’t need Predictive for this, but you can use the Predictive.get_vectorized_trace method for convenience. Note that this will only work if you have all your batch dims correctly annotated using pyro.plate. If that’s not the case, you can also just iterate through the samples sequentially, and use poutine.condition to condition your model on posterior samples to get model traces (one for each sample), instead of a vectorized trace as below.

samples = mcmc.get_samples()
pred = Predictive(model, samples)
tr = pred.get_vectorized_trace(data)
tr.compute_log_prob()
# sum out log probs for the batch dims
logy = sum_rightmost(tr.nodes['y']['log_prob'], -1)
# get MC estimate
print(logy.logsumexp() - log(num_samples))

Then, for each parameter, we input x_new and predict a y. Finally we use the mean of all y as our prediction.

I don’t fully follow this, but you can use the predictions however you want. Maybe the snippet above clarifies that.

Hi, I have a question related to this. I have a simple model:

def model(developer, time_since_joined):
    b_sigma = abs(pyro.sample('b_sigma', dist.Normal(0, 300)))
    c_sigma = abs(pyro.sample('c_sigma', dist.Normal(0, 6)))
    b0 = pyro.sample("b0", dist.Normal(0, 200))
    b1 = pyro.sample("b1", dist.Normal(0, 200))
    c0 = pyro.sample("c0", dist.Normal(0, 10))
    c1 = pyro.sample("c1", dist.Normal(0, 10))

    with pyro.plate('developer', developer):
        slack = pyro.sample("slack_comments", dist.Normal(b0 + b1 * time_since_joined, b_sigma), obs=slack_comments)
        github = pyro.sample("github_commits", dist.Normal(c0 + c1 * time_since_joined, c_sigma), obs=github_commits)
        return slack, github

And the following data:

comments commits time
Alice 7500 25 4.5
Bob 10100 32 6.0
Cole 18600 49 7.0
Danielle 25200 66 12.0
Erika 27500 96 18.0
slack_comments = torch.tensor(data.comments.values)
github_commits = torch.tensor(data.commits.values)
time = torch.tensor(data.time.values)

dims={
    "slack_comments": ["developer"],
    "github_commits": ["developer"],
    "time": ["developer"],
}

data_dict = {
    "developer": N,
    "time_since_joined": time
}

The developer variable is used to set the dimensions of the data and the lengths of developer and time_since_joined are equal. I have the following questions:

  1. When I compute posterior_predictive, I only get the observed data resampled for each iteration. Why is this happening?
nuts_kernel = NUTS(model, jit_compile=True, ignore_jit_warnings=True)
mcmc = MCMC(nuts_kernel, num_samples=400, warmup_steps=400,
            num_chains=4, disable_progbar=True)
mcmc.run(**data_dict)
posterior_samples = mcmc.get_samples()
posterior_predictive = Predictive(model, posterior_samples).forward(**data_dict)
prior = Predictive(model, num_samples=150).forward(**data_dict)
posterior_predictive
{'slack_comments': tensor([[ 7500, 10100, 18600, 25200, 27500],
         [ 7500, 10100, 18600, 25200, 27500],
         [ 7500, 10100, 18600, 25200, 27500],
         ...,
         [ 7500, 10100, 18600, 25200, 27500],
         [ 7500, 10100, 18600, 25200, 27500],
         [ 7500, 10100, 18600, 25200, 27500]]),
 'github_commits': tensor([[25, 32, 49, 66, 96],
         [25, 32, 49, 66, 96],
         [25, 32, 49, 66, 96],
         ...,
         [25, 32, 49, 66, 96],
         [25, 32, 49, 66, 96],
         [25, 32, 49, 66, 96]])}
  1. When I need predictions on new data (with different length than the data passed initially) I still get predictions that are of same length as the initial data and these are also observed values being resampled.
predictions_dict = {
    "developer": 2,
    "time_since_joined": candidate_devs_time
}
predictions = Predictive(model, posterior_samples).forward(**predictions_dict)
predictions
{'slack_comments': tensor([[ 7500, 10100, 18600, 25200, 27500],
         [ 7500, 10100, 18600, 25200, 27500],
         [ 7500, 10100, 18600, 25200, 27500],
         ...,
         [ 7500, 10100, 18600, 25200, 27500],
         [ 7500, 10100, 18600, 25200, 27500],
         [ 7500, 10100, 18600, 25200, 27500]]),
 'github_commits': tensor([[25, 32, 49, 66, 96],
         [25, 32, 49, 66, 96],
         [25, 32, 49, 66, 96],
         ...,
         [25, 32, 49, 66, 96],
         [25, 32, 49, 66, 96],
         [25, 32, 49, 66, 96]])}

Kindly suggest me steps to solve this.

It’s still not clear to me how we make predictions on new (or test) data using the results from an MCMC run.

I figured it out! Something like:

from pyro.infer import NUTS, MCMC
nuts_kernel = NUTS(model)
mcmc = MCMC(nuts_kernel, 1000)
mcmc.run(x_train, y_train)
samples_mcmc = mcmc.get_samples(1000)
test_mcmc = Predictive(model, samples_mcmc).forward(x_test)
1 Like