Large negative training loss in VAE

Hi,
My VAE is working in that the generated data matches the original very well but I’m getting a negative ELBO. I mostly follow the VAE tutorial however in my loss function I’m using a normal function and played around with the standard deviation until 0.01 seemed to work although I’m not quite sure why this is the case ?
The VAE is here :

class VAE(nn.Module):
    # z_dim refers to the latent space
    # and we use 400 hidden units
    def __init__(self, z_dim = 4 , hidden_dim= 400, use_cuda=False):
        super(VAE, self).__init__()
        # create the encoder and decoder networks
        self.encoder = Encoder(z_dim, hidden_dim)
        self.decoder = Decoder(z_dim, hidden_dim)
        if use_cuda:
            # calling cuda() here will put all the parameters of
            # the encoder and decoder networks into gpu memory
            self.cuda()
        self.use_cuda = use_cuda
        self.z_dim = z_dim

    # define the model p(x|z)p(z)
    def model(self, x):
        # register PyTorch module `decoder` with Pyro
        pyro.module("decoder", self.decoder)
        with pyro.iarange("data", x.size(0)) :
            # setup hyperparameters for prior p(z)
            z_loc = x.new_zeros(torch.Size((x.size(0), self.z_dim)))
            z_scale = x.new_ones(torch.Size((x.size(0), self.z_dim)))
            # sample from prior (value will be sampled by guide when computing the ELBO)
            z = pyro.sample("latent", dist.Normal(z_loc, z_scale).independent(1))
            # decode the latent code z
            loc_img = self.decoder.forward(z)
            # score against actual images
            sample = pyro.sample("obs" , dist.Normal(loc_img , 0.02).independent(1) 
                        , obs= x.reshape(-1, n_points))
            
    # define the guide (i.e. variational distribution) q(z|x)
    def guide(self, x):
        # register PyTorch module `encoder` with Pyro
        pyro.module("encoder", self.encoder)
        with pyro.iarange("data", x.size(0)):
            # use the encoder to get the parameters used to define q(z|x)
            z_loc, z_scale = self.encoder.forward(x)
            # sample the latent code z
            pyro.sample("latent", dist.Normal(z_loc, z_scale).independent(1))
```


The output is then

 2%|▏         | 1/50 [00:39<32:28, 39.77s/it][epoch 000]  average training loss: 1249.2473
[epoch 000] average test loss: -30897.1551

62%|██████▏   | 31/50 [22:26<13:45, 43.43s/it][epoch 030]  average training loss: -2696.5424
[epoch 030] average test loss: -171400.0223

Is this a problem if it's generating good reconstructions and if so any idea how to fix it ? 

Cheers 
Rhys

I think what we should expect is the (evidence) lower bound to increase with subsequent iterations, which is what you are observing. A positive value for the ELBO (i.e. negative of the reported loss) should not be an issue, as it just indicates that log(p(x)) is positive, which can be the case for continuous distributions.

Hi Neeraj
Thanks for your reply!
I got confused with mixing ELBO and loss interchangeably (forgetting the negative). If I’m correct in saying that a large ELBO is desirable, is there a reason why the ELBO for the unseen test data would be so much larger than the training data ?

is there a reason why the ELBO for the unseen test data would be so much larger than the training data ?

My guess for the difference is that the train and test batch sizes are different and the reported loss isn’t normalized by the batch size. Could you check if you observe a similar discrepancy even after adjusting for the batch size?

Yep just checked that was the problem there was a typo in my evaluate function. Thanks for the help !