Hi everybody,
I’m experimenting with the pyro’s VAE example. Everything is just copy and paste to my notebook. The main training loop is as follows:
# setup the VAE
vae = VAE(use_cuda=use_cuda)
# setup the optimizer
adam_args = {"lr": learning_rate}
optimizer = Adam(adam_args)
# setup the inference algorithm
svi = SVI(vae.model, vae.guide, optimizer, loss=Trace_ELBO())
train_elbo = []
# training loop
for epoch in range(num_epochs):
# initialize loss accumulator
epoch_loss = 0.
# do a training epoch over each mini-batch x returned
# by the data loader
for _, (x, _) in enumerate(train_loader):
# if on GPU put mini-batch into CUDA memory
if use_cuda:
x = x.cuda()
# do ELBO gradient and accumulate loss
# epoch_loss += svi.step(x)
batch_loss = svi.step(x)
epoch_loss += batch_loss
# print('batch loss', batch_loss)
# report training diagnostics
normalizer_train = len(train_loader.dataset)
total_epoch_loss_train = epoch_loss / normalizer_train
train_elbo.append(total_epoch_loss_train)
print("[epoch %03d] average training loss: %.4f" % (epoch, total_epoch_loss_train))
Everything work fine if I run this part of the code the first time. I have the loss logging:
[epoch 000] average training loss: 190.9459
[epoch 001] average training loss: 146.3057
[epoch 002] average training loss: 132.5690
[epoch 003] average training loss: 124.1392
[epoch 004] average training loss: 119.2743
[epoch 005] average training loss: 116.0978
[epoch 006] average training loss: 113.8199
[epoch 007] average training loss: 112.2251
[epoch 008] average training loss: 110.9388
[epoch 009] average training loss: 109.9328
[epoch 010] average training loss: 109.1249
But if I run the same code the second time then the loss explodes!
[epoch 000] average training loss: 585.9169
[epoch 001] average training loss: 585.8450
[epoch 002] average training loss: 585.9478
[epoch 003] average training loss: 585.8123
[epoch 004] average training loss: 585.9065
[epoch 005] average training loss: 585.8792
[epoch 006] average training loss: 585.8365
[epoch 007] average training loss: 585.9029
[epoch 008] average training loss: 585.8711
[epoch 009] average training loss: 585.7444
[epoch 010] average training loss: 585.9552
Something must have been saved internally in Pyro that change the result in the second run. Are there anybody having the same problem like me?
Thanks you in advance!!!