Turning bayesian regression example into bayesian feed-forward network

pyrolearner · May 8, 2018, 1:31pm

Hi. I tried to train a little more complex network than one presented in Bayesian Regression example. It is supposed to learn simple parabolic dependency. The determenistic version of this model successfully learns it. Whereas Bayesian one is able to learn only something close to straight line.

Here is the full code:

import torch
from torch.utils.data import Dataset, DataLoader
import numpy as np
import matplotlib.pyplot as plt
import pyro
import pyro.distributions as dist
from pyro.infer import SVI, Trace_ELBO
from pyro.optim import Adam, SGD

N = 500
x = (np.random.normal(size=N)).astype('float32')
y = (x ** 2 + np.random.normal(scale=0.05, size=N)).astype('float32')

hidden_size = 8
model = torch.nn.Sequential(
    torch.nn.Linear(1, hidden_size),
    torch.nn.Sigmoid(),
    torch.nn.Linear(hidden_size, 1),
)

def pyromodel(x, y):
    priors = {}
    for name, par in model.named_parameters():
        priors[name] = dist.Normal(torch.zeros(*par.shape), 10 * torch.ones(*par.shape)).independent(1)
    bayesian_model = pyro.random_module('bayesian_model', model, priors)
    sampled_model = bayesian_model()
    with pyro.iarange("map", len(x)):
        prediction_mean = sampled_model(x).squeeze(-1)
        pyro.sample("obs",
                    dist.Normal(prediction_mean,
                                0.05 * torch.ones(x.size(0))),
                    obs=y)
    
softplus = torch.nn.Softplus()

def guide(x, y):
    dists = {}
    for name, par in model.named_parameters():
        loc = pyro.param(name + '.loc', torch.randn(*par.shape))
        scale = softplus(pyro.param(name + '.scale',
                                    -3.0 * torch.ones(*par.shape) + 0.05 * torch.randn(*par.shape)))
        dists[name] = dist.Normal(loc, scale).independent(1)
    bayesian_model = pyro.random_module('bayesian_model', model, dists)
    return bayesian_model()

optim = Adam({"lr": 0.05})
svi = SVI(pyromodel, guide, optim, loss=Trace_ELBO())

pyro.clear_param_store()
for j in range(1000):
    loss = svi.step(torch.tensor(x[:, np.newaxis]), torch.tensor(y[:, np.newaxis]))
    if j % 100 == 0:
        print("[iteration %04d] loss: %.4f" % (j + 1, loss / float(N)))
        
        
ys = []
xs = np.linspace(-5, 5, 500, dtype='float32')
for i in range(50):
    sampled_model = guide(None, None)
    ys += [sampled_model(torch.tensor(xs[:, np.newaxis])).detach().numpy().flatten()]
ys = np.stack(ys).T

plt.scatter(x, y, s=2, c='r')
mean = ys.mean(1)
std = ys.std(1)
plt.fill_between(xs, mean - std, mean + std, facecolor='gray', alpha=0.2, interpolate=True)
plt.fill_between(xs, mean - std * 2, mean + std * 2, facecolor='gray', alpha=0.2, interpolate=True)
plt.fill_between(xs, mean - std * 3, mean + std * 3, facecolor='gray', alpha=0.2, interpolate=True)
plt.plot(xs, mean)

I’m trying to understand what is the problem. Am I using Pyro the right way?

pyrolearner · May 8, 2018, 4:33pm

Even when setting inital values of location parameters of weights to the values of trained determenistic model, the bayesian one degenerates to straight line.

fehiepsi · May 9, 2018, 2:03am

It seems that the bug lies at loss = svi.step(torch.tensor(x[:, np.newaxis]), torch.tensor(y[:, np.newaxis])), hence you are doing inference using only 1 (not 500) sample y.
You should use loss = svi.step(torch.tensor(x[:, np.newaxis]), torch.tensor(y)) to correct your
pyro.sample("obs", ...). A tip to detect bugs like this is: use pyro.enable_validation().

pyrolearner · May 9, 2018, 6:28am

torch.tensor(y) fix worked. But when I enable validation, training crashes with the following exception:

ValueError: at site "bayesian_model$$$0.weight", invalid log_prob shape
  Expected [], actual [8]
  Try one of the following fixes:
  - enclose the batched tensor in a with iarange(...): context
  - .independent(...) the distribution being sampled
  - .permute() data dimension

I tried to .independent(1) the normal distribution being sampled in pyromodel, but it didn’t work. Could you explain what’s going on?

fehiepsi · May 9, 2018, 7:47am

It will conflict to pyro.iarange().

Instead, you should change all .independent(1) to .independent(par.dim()) because weight and bias have different dim.

pyrolearner · May 10, 2018, 4:50am

Thanks! Does .independent(...) just mean that the elements of the tensor over given dimentions are considered as independent rvs and joint distribution can be computed as product?

neerajprad · May 10, 2018, 2:53pm

This is useful for example when converting a univariate normal to a multivariate normal with a diagonal covariance (the difference is just in the shape of the corresponding log_prob).

d1 = dist.Normal(torch.zeros(2), 1.)
assert d1.log_prob(torch.zeros(2)).shape == torch.Size((2,))

d2 = dist.Normal(torch.zeros(2), 1.).independent(1)
assert d2.log_prob(torch.zeros(2)).shape == torch.Size(())

The reason why we need this is because Pyro requires us to account for the independent batch dimensions (i.e. the dimensions of log_prob) at any sample site by either designating them as independent via pyro.iarange or moving them into event_shape to be summed out via .independent. This independence information is exploited by SVI. For more details, I would highly recommend going through the shapes tutorial (you can also run the corresponding notebook in tutorials to play around with it).