Getting svi loss of 1.40 but the accuracy of my pyro model is only ~26%

h56cho · November 1, 2020, 8:50pm

Hello,
I have been trying to use my Pyro neural network model to make predictions.
But I am keep scratching my head because I am getting ~1.40 for my training svi loss for my Pyro model, but the accuracy rate of the Pyro model is only ~26%. I am thinking this might has to do with the way I specfied my likelihood function for y.

My Pyro neural network model performs a 4-class classification task, that is, for a given input, the model predicts which of the 4 classes (class 0, class 1, class 2, class 3) the input is likely to belong. To be more specific, my Pyro model predicts the classification probabilities for each of the 4 classes (given in the vector prediction_scores), and once this is done, a user can select the class with the highest predicted classification probability to determine what the actual predicted class is. So, to make my original frequentist neural network to be Bayesian, I I used the Multinomial distribution as my likelihood function for y. My actual code for this model class is shown below:

class MyModel(PyroModule):
    
    def __init__(self,  model):
        super().__init__()
        self.model = model

    def forward(self, my_input, mc_labels = None, y = None):
    
        # compute the vector `prediction_scores` .
        # (classification probabilities before softmax)
        prediction_scores = self.model(my_input)
        
        # `softmax tensor` is a tensor of size 4,
        # and the tensor stores the predicted probability that an observation will
        # belong to the class 0, 1, 2, and 3.
        # for example, if softmax_tensor = torch.tensor([0.1, 0.4, 0.3, 0.2]), then
        # the model predicts that there is 0.1 chance for the observation to be
        # classfied under the class 0,
        # and the model predicts that there is 0.4 chance for the observation to be
        # classfied under the class 1, etc.
        softmax_tensor = nn.Softmax(dim=-1)(prediction_scores)
        
        # `mc_labels` is equivalent to the actual correct class
        # (i.e. the ``right'' answer)
        #
        # case 1: if `mc_labels` is given
        if mc_labels != None:
            
            # encode the `mc_label` in a form that is adaquate to
            # use with the Multinomial function.
            if mc_labels == torch.tensor([0]):
                 mc_label_tensor = torch.tensor([[1.,0.,0.,0.]])

            elif mc_labels == torch.tensor([1]):
                 mc_label_tensor = torch.tensor([[0.,1.,0.,0.]])

            elif mc_labels == torch.tensor([2]):
                 mc_label_tensor = torch.tensor([[0.,0.,1.,0.]])

            elif mc_labels == torch.tensor([3]):
                 mc_label_tensor = torch.tensor([[0.,0.,0.,1.]])
        
        # case 2: if `mc_labels` is not given
        else:
            mc_label_tensor = None
  
        # `y` here stands for the predicted class type for the observation.
        return pyro.sample("y",
                    dist.Multinomial(1, probs = softmax_tensor),
                    obs = mc_label_tensor)

I know that this is rather cumbersome, but could you tell me whether the way I assigned the likelihood function for y (or the way I specfied the model class in general) is incorrect?

Thank you,

PS: I am thinking, maybe instead of doing return pyro.sample("y", dist.Multinomial(**1**, probs = softmax_tensor), obs = mc_label_tensor), I should do something like return pyro.sample("y", dist.Multinomial(**100**, probs = softmax_tensor), obs = mc_label_tensor), would this improve my accuracy rate greatly? :S Thank you again,

eb8680_2 · November 3, 2020, 1:20am

Hi @h56cho, I’m sorry you’re still having trouble. I don’t see anything obviously wrong with your snippet, although I’m not sure I understand how mc_label_tensor is supposed to work for batches of data. Why not just use dist.Categorical on mc_labels?

return pyro.sample("y", dist.Categorical(probs=softmax_tensor), obs=mc_labels)

I also doubt that this is your problem, or that there’s some subtle Pyro bug somewhere else in your code - a more likely explanation for why your model is failing to learn anything at all (since 26% is chance performance here) is that you’re working on a difficult problem and have not yet found hyperparameter settings that work. The ELBO improvement you are seeing may simply reflect the guide learning to match the priors, a common pathology in SVI in high dimensions as discussed in this section of the Deep Markov model tutorial.

Training Bayesian neural networks is subject to the same failure modes as training regular neural networks, e.g. strong dependence on parameter initialization, so you might try reading through collections of practical knowledge, like this textbook chapter or the experimental details and appendices of research papers on similar problems. For Bayesian neural networks in particular, you might additionally try using HMC instead of SVI if your neural network and dataset are small enough, since it’s much more likely to work “out of the box.”

More generally, the more atomic you can make your questions, the more helpful we can be. It’s much easier for us to provide high-quality answers to many Pyro-focused, relatively context-independent questions like “how do I change the initial value of autoguide parameters in this runnable code snippet” than fewer open-ended questions that depend on the details of your research problem, especially when that problem is far from our areas of expertise. You might consider asking a colleague in your company or university department who has worked with Bayesian neural networks before to help you walk through your problem and come up with a list of actionable hypotheses.

h56cho · November 4, 2020, 5:01am

Hello,

Thank you very much for your reply.
I have a Pyro question about the implementation of svi.step(). is annealing_factor one of the parameters that can be passed in as an argument to svi.step()? (i.e. can I simply do svi.step(my_input,annealing_factor=....), once I define what my annealing_factor should be)? or do I need to define annealing_factor as one of the inputs for my neural network model before I try to execute svi.step(my_input,annealing_factor=....)?

Thank you,

eb8680_2 · November 5, 2020, 5:34pm

is annealing_factor one of the parameters that can be passed in as an argument to svi.step() ?

No, there’s no built-in annealing functionality in pyro.infer.SVI - I guess that DMM tutorial section is a bit confusing. If you want to use annealing as described there, you’ll need to add annealing_factor as an argument to your model and guide and put a pyro.poutine.scale(scale=annealing_factor) context manager around the relevant sample sites in your model and guide as in the full DMM code.

h56cho · November 5, 2020, 5:58pm

Hello,

Thank you for your reply.
If I am using AutoDiagNormal guide instead of a customized guide function, how should I accommodate annealing_factor into the AutoDiagNormal guide? (if possible)?

Thank you,

eb8680_2 · November 9, 2020, 6:17pm

If I am using AutoDiagNormal guide instead of a customized guide function, how should I accommodate annealing_factor into the AutoDiagNormal guide? (if possible)?

You can just create the guide as usual and wrap it with a scale handler:

guide = AutoDiagNormal(model)

guide = pyro.poutine.scale(guide, scale=annealing_factor)
# or 
guide = lambda annealing_factor, *args: pyro.poutine.scale(guide, scale=annealing_factor)(*args)