Large negative training loss in VAE

rgreen1995 · September 11, 2018, 3:34pm

Hi,
My VAE is working in that the generated data matches the original very well but I’m getting a negative ELBO. I mostly follow the VAE tutorial however in my loss function I’m using a normal function and played around with the standard deviation until 0.01 seemed to work although I’m not quite sure why this is the case ?
The VAE is here :

class VAE(nn.Module):
    # z_dim refers to the latent space
    # and we use 400 hidden units
    def __init__(self, z_dim = 4 , hidden_dim= 400, use_cuda=False):
        super(VAE, self).__init__()
        # create the encoder and decoder networks
        self.encoder = Encoder(z_dim, hidden_dim)
        self.decoder = Decoder(z_dim, hidden_dim)
        if use_cuda:
            # calling cuda() here will put all the parameters of
            # the encoder and decoder networks into gpu memory
            self.cuda()
        self.use_cuda = use_cuda
        self.z_dim = z_dim

    # define the model p(x|z)p(z)
    def model(self, x):
        # register PyTorch module `decoder` with Pyro
        pyro.module("decoder", self.decoder)
        with pyro.iarange("data", x.size(0)) :
            # setup hyperparameters for prior p(z)
            z_loc = x.new_zeros(torch.Size((x.size(0), self.z_dim)))
            z_scale = x.new_ones(torch.Size((x.size(0), self.z_dim)))
            # sample from prior (value will be sampled by guide when computing the ELBO)
            z = pyro.sample("latent", dist.Normal(z_loc, z_scale).independent(1))
            # decode the latent code z
            loc_img = self.decoder.forward(z)
            # score against actual images
            sample = pyro.sample("obs" , dist.Normal(loc_img , 0.02).independent(1) 
                        , obs= x.reshape(-1, n_points))
            
    # define the guide (i.e. variational distribution) q(z|x)
    def guide(self, x):
        # register PyTorch module `encoder` with Pyro
        pyro.module("encoder", self.encoder)
        with pyro.iarange("data", x.size(0)):
            # use the encoder to get the parameters used to define q(z|x)
            z_loc, z_scale = self.encoder.forward(x)
            # sample the latent code z
            pyro.sample("latent", dist.Normal(z_loc, z_scale).independent(1))
```


The output is then

 2%|▏         | 1/50 [00:39<32:28, 39.77s/it][epoch 000]  average training loss: 1249.2473
[epoch 000] average test loss: -30897.1551

62%|██████▏   | 31/50 [22:26<13:45, 43.43s/it][epoch 030]  average training loss: -2696.5424
[epoch 030] average test loss: -171400.0223

Is this a problem if it's generating good reconstructions and if so any idea how to fix it ? 

Cheers 
Rhys

neerajprad · September 11, 2018, 4:05pm

I think what we should expect is the (evidence) lower bound to increase with subsequent iterations, which is what you are observing. A positive value for the ELBO (i.e. negative of the reported loss) should not be an issue, as it just indicates that log(p(x)) is positive, which can be the case for continuous distributions.

rgreen1995 · September 11, 2018, 4:16pm

Hi Neeraj
Thanks for your reply!
I got confused with mixing ELBO and loss interchangeably (forgetting the negative). If I’m correct in saying that a large ELBO is desirable, is there a reason why the ELBO for the unseen test data would be so much larger than the training data ?

neerajprad · September 11, 2018, 4:22pm

is there a reason why the ELBO for the unseen test data would be so much larger than the training data ?

My guess for the difference is that the train and test batch sizes are different and the reported loss isn’t normalized by the batch size. Could you check if you observe a similar discrepancy even after adjusting for the batch size?

rgreen1995 · September 11, 2018, 4:28pm

Yep just checked that was the problem there was a typo in my evaluate function. Thanks for the help !