 # Bayesian Regression Tutorial: Error setting parallel=True in Predictive class

• What tutorial are you running?
Bayesian Regression - Introduction (Part 1)
• What version of Pyro are you using?
Pyro Version 1.7.0

Hello! First and foremost, I just wanted to thank you for creating such an amazing library. I really love working with Pyro.

In the Bayesian Linear Regression tutorial, I’m trying to speed up the final step where “We generate 800 samples from our trained model.” In particular, I’m trying to set the parameter parallel=True in the Predictive class:
predictive = Predictive(model, guide=guide, num_samples=800,
return_sites=(“linear.weight”, “obs”, “_RETURN”), parallel=True)

However, when making this change I get the following error:
RuntimeError: t() expects a tensor with <= 2 dimensions, but self is 3D
Trace Shapes:
Param Sites:
Sample Sites:
sigma dist 800 1 |
value 800 1 |
linear.weight dist 800 1 | 1 3
value 800 | 1 3
linear.bias dist 800 1 | 1
value 800 | 1

Looking through this forum and the Pyro docs, I’ve tried “wrapping the existing model in an outermost `plate` messenger” but I keep getting this error. My main question is how to modify the model used in the tutorial to support generating samples in parallel. From the tutorial, this is the original model definition that I’ve been using:
class BayesianRegression(PyroModule):
def init(self, in_features, out_features):
super().init()
self.linear = PyroModule[nn.Linear](in_features, out_features)
self.linear.weight = PyroSample(dist.Normal(0., 1.).expand([out_features, in_features]).to_event(2))
self.linear.bias = PyroSample(dist.Normal(0., 10.).expand([out_features]).to_event(1))

``````def forward(self, x, y=None):
sigma = pyro.sample("sigma", dist.Uniform(0., 10.))
mean = self.linear(x).squeeze(-1)
with pyro.plate("data", x.shape):
obs = pyro.sample("obs", dist.Normal(mean, sigma), obs=y)
return mean
``````

This is due to PyTorch `nn.Linear` assumes the input is 2D. You might use a “modified” version of `nn.Linear` that works for multi-dimensional inputs.

Thank you, I really appreciate your response! I’m new to PyTorch, but it looks like newer versions of nn.Linear do support multi-dimensional inputs. Sorry, I’m just trying to wrap my head around the docstring for the Pyro Predictive class which states (for the parallel param): “predict in parallel by wrapping the existing model in an outermost `plate` messenger. Note that this requires that the model has all batch dims correctly annotated via :class:`~pyro.plate`. Default is `False`.” Does that mean that I need to wrap the call to the linear model inside of a ‘batches’ plate? Because each of the linear models (each from an independent sample from the posterior distribution) can be run independently to generate the samples in parallel?

``````def forward(self, x, y=None):
with pyro.plate("batches", batch_size):
sigma = pyro.sample("sigma", dist.Uniform(0., 10.))
mean = self.linear(x).squeeze(-1)
with pyro.plate("data", x.shape):
obs = pyro.sample("obs", dist.Normal(mean, sigma), obs=y)
return mean``````

Sorry, it should be: Linear does not work for a batch of weights (not inputs).

Thanks @fehiepsi , that makes sense!

Do you have a recommendation for how to wrap a linear regression model in a plate messenger so that predictive samples can be generated in parallel? In the Bayesian Regression tutorial, it says that

We generate 800 samples from our trained model. Internally, this is done by first generating samples for the unobserved sites in the `guide` , and then running the model forward by conditioning the sites to values sampled from the `guide` . Refer to the Model Serving section for insight on how the `Predictive` class works.

For my bayesian linear regression model, I’d like to generate 1000 predictive samples from the trained model for each new input example. I’m not sure if serving the model via TorchScript would improve the time it takes to generate these samples. I’ve been trying to follow the recommendations in this forum post: “When `parallel=False` , `Predictive` has to run your model once per sample, which as you are seeing will be very slow for large numbers of samples.” But I can’t seem to get this working with a simple linear model.

I’m just trying to find an example of how to wrap a linear regression model in an outermost `plate` messenger to take advantage of the ‘parallel=True’ functionality in the ‘Predictive’ class. If you have any other suggestions or examples I’d greatly appreciate it. For now, I might just try writing my own linear model (as in the second part of the tutorial) to see if that works with the ‘parallel=True’ functionality.

I would follow the same approach. `nn.Linear` does not support a batch of weight parameters so it is better to write manual code for it. Alternatively, you can write your own `Linear` module. I would expect that given input x, weight w, bias b, the forward method would be:

``````def forward(self, x):
return (w @ x.unsqueeze(-1)).squeeze(-1) + b
``````

That might not work yet but you can reshape `w` and `x` properly to make it work.

Hi @fehiepsi , thank you so much for all your help, I greatly appreciate it! I managed to get the `Predictive(parallel=True)` functionality working using my own linear model (with one response variable and num_features = x.shape):

``````def model(x, y=None):
weight = pyro.sample("weight", dist.Normal(0., 1.).expand([x.shape, 1]).to_event(2))
bias = pyro.sample("bias", dist.Normal(0., 10.).expand().to_event(1))
sigma = pyro.sample("sigma", dist.Uniform(0., 10.))
mu = torch.matmul(x, weight).squeeze(-1) + bias
with pyro.plate("data", x.shape):
obs = pyro.sample("obs", dist.Normal(mu, sigma), obs=y)
return mu
``````

With the above linear model, I get similar results to using `nn.Linear` but now with a 10X speed improvement when using:

predictive = Predictive(model, guide, num_samples=1000, return_sites=(“obs”, “_RETURN”), parallel=True)

One thing I did notice is that while the predictions for new observations are very similar, they DO differ slightly from what I obtained using `nn.Linear`. At first, I thought that batching the weight updates may have effected the way the MultivariateNormal guide I’m using, `AutoMultivariateNormal(model, init_loc_fn=init_to_mean)` “generates samples from a Cholesky factorization of a multivariate normal distribution”. But after controlling for the random number generation process, I noticed the following behavior:

1. Using `PyroSample` from the `nn.Linear` example in the tutorial:
pyro.set_rng_seed(42)
linear.bias = PyroSample(dist.Normal(0., 10.).expand().to_event(1))
linear.bias: tensor([1.2881])
Reseting the seed produces the same result.
pyro.set_rng_seed(42)
linear.bias = PyroSample(dist.Normal(0., 10.).expand().to_event(1))
linear.bias: tensor([1.2881])

2. Using `pyro.sample` from my own linear regression model.
pyro.set_rng_seed(42)
bias = pyro.sample(“bias”, dist.Normal(0., 10.).expand().to_event(1))
bias: tensor([3.3669])
Reseting the seed produces the same result.
pyro.set_rng_seed(42)
bias = pyro.sample(“bias”, dist.Normal(0., 10.).expand().to_event(1))
bias: tensor([3.3669])

3. Using the same seed for `PyroSample` and `pyro.sample`.
pyro.set_rng_seed(42)
linear.bias = PyroSample(dist.Normal(0., 10.).expand().to_event(1))
linear.bias: tensor([1.2881])
Reseting the seed produces DIFFERENT results.
pyro.set_rng_seed(42)
bias = pyro.sample(“bias”, dist.Normal(0., 10.).expand().to_event(1))
bias: tensor([3.3669])

So it seems like the difference was the `nn.Linear` (non-parallel) model using `PyroSample` and the new parallel model using `pyro.sample` which generate different random numbers between the two methods given the same seed (which I didn’t expect) but the same random numbers within the same method (as expected).

Thanks again!