Bayesian CNN

Hi,

I am new to pyro. To try it I wanted to train a CNN module using bayesian inference using pyro.

Can Anyone help me to transform it to random_module pyro?
The network is the following:

class NetC(nn.Module):
    def __init__(self, nc, nclass):
        super(NetC, self).__init__()
        self.nclass = nclass
        self.convlayers = nn.Sequential(
            nn.Conv2d(nc, 32, kernel_size=(2, 2), stride=1),
            nn.BatchNorm2d(32),
            nn.ReLU(False),
            nn.MaxPool2d(kernel_size=2),
            nn.Conv2d(32, 64, kernel_size=2, stride=1),
            nn.BatchNorm2d(64),
            nn.ReLU(False),
            nn.MaxPool2d(kernel_size=2),
            nn.Conv2d(64, 64, kernel_size=2, stride=1),
            nn.BatchNorm2d(64),
            nn.ReLU(False),         
            nn.MaxPool2d(kernel_size=2),  
            nn.Conv2d(64, 128, kernel_size=2, stride=1),
            nn.BatchNorm2d(128),
            nn.ReLU(False),
        )
        self.fc = nn.Sequential(
            nn.Linear(128*6*6, 2048),  # FC
            nn.Dropout(),
            nn.ReLU(False),
            nn.Linear(2048, 2048),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(2048, self.nclass),  # classifier
        )
    def forward(self, x):
        x0 = self.convlayers(x)
        x0 = x0.view(x0.size(0), -1)
       return self.fc(x0)

(edited by @fritzo to format python code)

1 Like

please be advised that using vanilla stochastic variational inference to learn a bayesian neural network model with so many parameters is exceedingly unlikely to work. for one thing, the variance of the gradients will be extremely large.

if you work a bit harder you can do things like what’s done in, for example, this paper

but even there learning the model is going to be quite difficult.

What about this to deal with complex model structure (like CNN):

  • Start with standard NN optimization, by minimizing mse with a standard pytorch optimizer, to reach a reasonable local minimum.
  • Transfer weights to a Bayseian Network and finish learning with pyro SVI steps.

Has someone tried to implement this kinf of process ?

that is unlikely to work for a number of reasons. one is that the KL divergence term will tend to over-regularize the weights, so once you switch to SVI you’ll quickly ‘destroy’ the MSE solution you found

1 Like