Multi-Class Regression on MNIST

ThinkPad · February 19, 2020, 10:48am

What tutorial are you running?
Bayesian Regression
What version of Pyro are you using?
1.2.1
Please link or paste relevant code, and steps to reproduce.

Dear Pyro-Team,
I am trying to perform softmax regression on MNIST using mini-batch SGD with a mean-field variational density and multi-variate priors with diagonal covariance over the weight matrix and bias vector. The loss reduces, however accuracy over a training batch is always zero.

The respective code:

class SoftmaxRegression(PyroModule):
    def __init__(self, in_features, out_features):
        super().__init__()
        self.linear = PyroModule[nn.Linear](in_features, out_features)

        # Multi-variate Normal priors for weight matrix and bias vector
        self.linear.weight = PyroSample(
            prior=dist.Normal(0., 1.).expand([out_features, in_features]).to_event(self.linear.weight.dim())
        )
        self.linear.bias = PyroSample(
            prior=dist.Normal(0., 10.).expand([out_features]).to_event(self.linear.bias.dim())
        )

    def forward(self, x, y=None):
        # Forward method defines the likelihood function of the statistical model with mean f(x)
        mean = self.linear(x)
        # Define Categorical likelihood over i.i.d. data set, i.e. for each data point a separate likelihood
        with pyro.plate('data', size=x.shape[0]):
            likelihood = pyro.sample('likelihood', dist.Categorical(logits=mean), obs=y)
        return mean


def train():
    pyro.clear_param_store()

    num_epochs = 10

    train_loader, test_loader, val_loader = get_data_loaders()
    data_generator = inf_generator(train_loader)
    batches_per_epoch = len(train_loader)

    model = SoftmaxRegression(28*28, 10)
    variational_density = AutoDiagonalNormal(model=model)

    optimizer = optim.Adam({'lr': 1e-2})
    svi = SVI(model=model, guide=variational_density, optim=optimizer, loss=Trace_ELBO())

    for itr in range(batches_per_epoch * num_epochs):
        x, y = data_generator.__next__()
        x = x.view(-1, 28*28) 
        loss = svi.step(x, y)

        if itr % batches_per_epoch == 0:
            with torch.no_grad():
                posterior_predictive = Predictive(model=model, guide=variational_density, num_samples=50,
                                                  return_sites=('likelihood', '_RETURN'))
                predictive_samples = posterior_predictive(x)
                predictive_mean = torch.mean(predictive_samples['_RETURN'], dim=0)
                y = one_hot(np.array(y.numpy()), 10)
                target = np.argmax(y, axis=1)
                pred = np.argmax(predictive_mean, axis=1)
                acc = np.sum(pred == target) / 64.
                print('Training Batch Accuracy: {} | Loss: {}'.format(acc, loss / len(train_loader)))

Thank you for your help.

fehiepsi · February 20, 2020, 5:08pm

This indicates that there are some bugs in your evaluation code. I would suggest to print out line-by-line results to see which code is wrong.

ThinkPad · February 21, 2020, 7:57am

Dear fehiepsi,

thank you for your reply!
I was suspecting I had made some kind of mistake in the model, but since you suggest the evaluation I will gladly take a closer look.

Regards!

ThinkPad · February 21, 2020, 1:14pm

Thank you for your help!

I was so focused on pyro, that I didn’t realize I forgot to convert my predictions to numpy. Naturally, the line

acc = np.sum(pred_train == target_train) / 64.

didn’t work.

Regards!

ThinkPad · March 24, 2020, 11:10am

Dear Pyro-Forum,

I have a follow-up question regarding the same model. I would like to compute my posterior predictive’s neg. log-likelihood, i.e. the probability of observing the data calculated under the posterior predictive’s log-density.

I understand this can be done similar to this post or this example. However, this involves annotating the prior definitions in the model using plate statements. Unfortunately I haven’t quite understood how this would work, especially in conjunction with PyroSample.

Currently, when executing:

pred = Predictive(model=model, guide=variational_density, num_samples=10)
pred.get_vectorized_trace(data)

I am getting the following error:

.local/lib/python3.6/site-packages/torch/nn/functional.py", line 1370, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: t() expects a tensor with <= 2 dimensions, but self is 3D
     Trace Shapes:            
      Param Sites:            
     Sample Sites:            
linear.weight dist 10  1 | 1 2
             value    10 | 1 2
  linear.bias dist 10  1 | 1  
             value    10 | 1

I have tried to change the prior definition to:

with pyro.plate('x_axis', size=in_features):
            with pyro.plate('y_axis', size=out_features):
                self.linear.weight = PyroSample(
                    prior=dist.Normal(weight_loc, weight_scale)
                )

        with pyro.plate('bias', size=out_features):
            self.linear.bias = PyroSample(
                prior=dist.Normal(bias_loc, bias_scale)
            )

in order to get independent batch dimensions, i.e. assume all weights to be i.i.d. However, this leads to:

.local/lib/python3.6/site-packages/pyro/util.py", line 288, in check_site_shape
    '- .permute() data dimensions']))
ValueError: at site "linear.weight", invalid log_prob shape
  Expected [], actual [1, 2]
  Try one of the following fixes:
  - enclose the batched tensor in a with plate(...): context
  - .to_event(...) the distribution being sampled
  - .permute() data dimensions

Any help is much appreciated.

Regards!

fehiepsi · March 26, 2020, 3:29am

@ThinkPad I think the issue is PyTorch nn.Linear does not work with a batch of weights, so pred.get_vectorized_trace won’t work. A workaround is to run pred(data) to get loc and scale samples (you might use pyro.deterministic to declare that you want to record those values) and manually compute log-likelihood loglik = dist.Normal(loc, scale).log_prob(obs)

ThinkPad · March 26, 2020, 11:00am

Dear @fehiepsi,

thank you again for your reply!
As you suggested I proceeded by:

posterior_predictive = Predictive(model=model, guide=variational_density, num_samples=num_mc_samples,
                                      return_sites=['_RETURN'])    # Get model output using draws from the variational density
predictive_logits = torch.mean(posterior_predictive(data)['_RETURN'], dim=0)
log_likelihood = dist.Categorical(logits=predictive_logits).log_prob(target).sum() / len(target)

For the SoftMax-Regression Model over MNIST I get a -log_likelihood of about 1.3 and for the same model over CIFAR10 -log_likelihood of about 29.4. Thus, the neg. log-likelihood of observing the data under this simple model drastically increases for more complicated data, which is what one would expect.

Regards and thank you again for your help!