Uncertainty in Bayesian Regression example

manuelbaum · March 23, 2018, 10:00am

I am using pyro version 0.1.2 and run the bayesian regression example from the tutorial section. Either there is something odd, or I don’t understand what is supposed to happen:

Looking at the uncertainty of the estimated parameters if I run the code as provided on the website I get similar results:

Now I wanted to see if I can influence the estimated uncertainty by changing the number of data-points, changing noise in the data or changing the uncertainty in the prior. It seems that only changing the uncertainty in the prior has an effect, but neither changing noise in the data, nor using fewer data-points (e.g. 2 or 3 as compared to some 10k).

Is this the expected behavior? I thought having fewer data or data with higher noise should also increase the uncertainty over estimated parameters!

Additionally, there seems to be a small error in the example code:

The function

def build_linear_dataset(N, noise_std=0.1):

takes noise_std as second parameter, but further down where the function is called it is assumed that the second parameter would be num_features

jpchen · March 24, 2018, 9:09am

using fewer data-points…

that certainly does affect the uncertainty, which is also subject to how your hyperparameters are tuned. what are you running where you do not see a change in uncertainty even after changing the number of data points? you can try running for more iterations and see if you observe a difference.

takes noise_std as second parameter, but further down…

good catch, it’s correct in the examples, but hasnt been updated on the website. i made a PR to fix that,

manuelbaum · March 26, 2018, 10:34am

Hello jpchen,

thanks a lot for your answer! Your reference to the examples on github got me on the right track and I think I found the problem.

When I execute the example code directly from github it works as expected (fewer datapoints - higher uncertainty). And I narrowed it down to this difference between the code on github and the one in the example documents:

On Bayesian Regression - Introduction (Part 1) — Pyro Tutorials 1.8.4 documentation :

def model(data):
[…]
#run the regressor forward conditioned on data
prediction_mean = lifted_reg_model(x_data).squeeze()
# condition on the observed data
pyro.sample(“obs”,
Normal(prediction_mean, Variable(0.1 * torch.ones(data.size(0)))),
obs=y_data.squeeze())

VS on github:

def model(data):
[…]
with pyro.iarange(“map”, N, subsample=data):
x_data = data[:, :-1]
y_data = data[:, -1]
# run the regressor forward conditioned on inputs
prediction_mean = lifted_reg_model(x_data).squeeze()
pyro.observe(“obs”, Normal(prediction_mean, Variable(torch.ones(data.size(0))).type_as(data)), y_data.squeeze())

And the significant difference seems to be that the model from the pyro.ai website assumes sigma=0.1 vs sigma=1.0 on github. If I set sigma=1.0 for the code from the docs, then uncertainty in the parameters also increases with a lower number of observations.

What’s a little unclear to me is, why that’s the case. Shouldn’t it be the other way around and decreasing assumed noise in the data should, when observing noisy data, increase uncertainty in the parameters?

[To answer your question what I was running: I directly copied the code from the docs on pyro.ai, just changed the way the parameters are passed to the function as discussed above. Increasing iterations didn’t change anything]

Again, thanks a lot for your answer!

jpchen · March 27, 2018, 2:00am

Shouldn’t it be the other way around and decreasing assumed noise in the data should, when observing noisy data, increase uncertainty in the parameters?

not exactly. a hand-wavy explanation that might help impart some intuition is that a larger variance is more forgiving of values that deviate from the mean, so the model will learn more lines that fit the data. a smaller variance implies we are more certain about our prediction, so the loss incurred by the likelihood will move the parameter estimate, but with higher certainty. you should play with the data and observation noises and look at the resulting posteriors they induce.