Bayesian Regression Tutorial: Error setting parallel=True in Predictive class

Sea2 · November 4, 2021, 2:01pm

What tutorial are you running?
Bayesian Regression - Introduction (Part 1)
What version of Pyro are you using?
Pyro Version 1.7.0
Please link or paste relevant code, and steps to reproduce.

Hello! First and foremost, I just wanted to thank you for creating such an amazing library. I really love working with Pyro.

In the Bayesian Linear Regression tutorial, I’m trying to speed up the final step where “We generate 800 samples from our trained model.” In particular, I’m trying to set the parameter parallel=True in the Predictive class:
predictive = Predictive(model, guide=guide, num_samples=800,
return_sites=(“linear.weight”, “obs”, “_RETURN”), parallel=True)

Looking through this forum and the Pyro docs, I’ve tried “wrapping the existing model in an outermost plate messenger” but I keep getting this error. My main question is how to modify the model used in the tutorial to support generating samples in parallel. From the tutorial, this is the original model definition that I’ve been using:
class BayesianRegression(PyroModule):
def init(self, in_features, out_features):
super().init()
self.linear = PyroModule[nn.Linear](in_features, out_features)
self.linear.weight = PyroSample(dist.Normal(0., 1.).expand([out_features, in_features]).to_event(2))
self.linear.bias = PyroSample(dist.Normal(0., 10.).expand([out_features]).to_event(1))

def forward(self, x, y=None):
    sigma = pyro.sample("sigma", dist.Uniform(0., 10.))
    mean = self.linear(x).squeeze(-1)
    with pyro.plate("data", x.shape[0]):
        obs = pyro.sample("obs", dist.Normal(mean, sigma), obs=y)
    return mean

Thanks for your help!

fehiepsi · November 6, 2021, 2:52am

This is due to PyTorch nn.Linear assumes the input is 2D. You might use a “modified” version of nn.Linear that works for multi-dimensional inputs.

Sea2 · November 6, 2021, 9:58pm

Hi @fehiepsi

Thank you, I really appreciate your response! I’m new to PyTorch, but it looks like newer versions of nn.Linear do support multi-dimensional inputs. Sorry, I’m just trying to wrap my head around the docstring for the Pyro Predictive class which states (for the parallel param): “predict in parallel by wrapping the existing model in an outermost plate messenger. Note that this requires that the model has all batch dims correctly annotated via :class:~pyro.plate. Default is False.” Does that mean that I need to wrap the call to the linear model inside of a ‘batches’ plate? Because each of the linear models (each from an independent sample from the posterior distribution) can be run independently to generate the samples in parallel?

def forward(self, x, y=None):
    with pyro.plate("batches", batch_size):
        sigma = pyro.sample("sigma", dist.Uniform(0., 10.))
        mean = self.linear(x).squeeze(-1)
        with pyro.plate("data", x.shape[0]):
            obs = pyro.sample("obs", dist.Normal(mean, sigma), obs=y)
        return mean

fehiepsi · November 7, 2021, 1:52am

Sorry, it should be: Linear does not work for a batch of weights (not inputs).

Sea2 · November 7, 2021, 11:01pm

Thanks @fehiepsi , that makes sense!

Do you have a recommendation for how to wrap a linear regression model in a plate messenger so that predictive samples can be generated in parallel? In the Bayesian Regression tutorial, it says that

We generate 800 samples from our trained model. Internally, this is done by first generating samples for the unobserved sites in the guide , and then running the model forward by conditioning the sites to values sampled from the guide . Refer to the Model Serving section for insight on how the Predictive class works.

For my bayesian linear regression model, I’d like to generate 1000 predictive samples from the trained model for each new input example. I’m not sure if serving the model via TorchScript would improve the time it takes to generate these samples. I’ve been trying to follow the recommendations in this forum post: “When parallel=False , Predictive has to run your model once per sample, which as you are seeing will be very slow for large numbers of samples.” But I can’t seem to get this working with a simple linear model.

I’m just trying to find an example of how to wrap a linear regression model in an outermost plate messenger to take advantage of the ‘parallel=True’ functionality in the ‘Predictive’ class. If you have any other suggestions or examples I’d greatly appreciate it. For now, I might just try writing my own linear model (as in the second part of the tutorial) to see if that works with the ‘parallel=True’ functionality.

fehiepsi · November 8, 2021, 5:38pm

I would follow the same approach. nn.Linear does not support a batch of weight parameters so it is better to write manual code for it. Alternatively, you can write your own Linear module. I would expect that given input x, weight w, bias b, the forward method would be:

def forward(self, x):
    return (w @ x.unsqueeze(-1)).squeeze(-1) + b

That might not work yet but you can reshape w and x properly to make it work.

Sea2 · November 15, 2021, 8:50pm

Hi @fehiepsi , thank you so much for all your help, I greatly appreciate it! I managed to get the Predictive(parallel=True) functionality working using my own linear model (with one response variable and num_features = x.shape[1]):

def model(x, y=None):
    weight = pyro.sample("weight", dist.Normal(0., 1.).expand([x.shape[1], 1]).to_event(2))
    bias = pyro.sample("bias", dist.Normal(0., 10.).expand([1]).to_event(1))
    sigma = pyro.sample("sigma", dist.Uniform(0., 10.))
    mu = torch.matmul(x, weight).squeeze(-1) + bias
    with pyro.plate("data", x.shape[0]):
        obs = pyro.sample("obs", dist.Normal(mu, sigma), obs=y)
    return mu

With the above linear model, I get similar results to using nn.Linear but now with a 10X speed improvement when using:

predictive = Predictive(model, guide, num_samples=1000, return_sites=(“obs”, “_RETURN”), parallel=True)

One thing I did notice is that while the predictions for new observations are very similar, they DO differ slightly from what I obtained using nn.Linear. At first, I thought that batching the weight updates may have effected the way the MultivariateNormal guide I’m using, AutoMultivariateNormal(model, init_loc_fn=init_to_mean) “generates samples from a Cholesky factorization of a multivariate normal distribution”. But after controlling for the random number generation process, I noticed the following behavior:

Using PyroSample from the nn.Linear example in the tutorial:
pyro.set_rng_seed(42)
linear.bias = PyroSample(dist.Normal(0., 10.).expand([1]).to_event(1))
linear.bias: tensor([1.2881])
Reseting the seed produces the same result.
pyro.set_rng_seed(42)
linear.bias = PyroSample(dist.Normal(0., 10.).expand([1]).to_event(1))
linear.bias: tensor([1.2881])
Using pyro.sample from my own linear regression model.
pyro.set_rng_seed(42)
bias = pyro.sample(“bias”, dist.Normal(0., 10.).expand([1]).to_event(1))
bias: tensor([3.3669])
Reseting the seed produces the same result.
pyro.set_rng_seed(42)
bias = pyro.sample(“bias”, dist.Normal(0., 10.).expand([1]).to_event(1))
bias: tensor([3.3669])
Using the same seed for PyroSample and pyro.sample.
pyro.set_rng_seed(42)
linear.bias = PyroSample(dist.Normal(0., 10.).expand([1]).to_event(1))
linear.bias: tensor([1.2881])
Reseting the seed produces DIFFERENT results.
pyro.set_rng_seed(42)
bias = pyro.sample(“bias”, dist.Normal(0., 10.).expand([1]).to_event(1))
bias: tensor([3.3669])

So it seems like the difference was the nn.Linear (non-parallel) model using PyroSample and the new parallel model using pyro.sample which generate different random numbers between the two methods given the same seed (which I didn’t expect) but the same random numbers within the same method (as expected).

Thanks again!