Hi, I have model that is an expanded version of the SS-VAE example. The main change I’ve made is to replace the categorical latent “y”, with a grid of Bernoulli latents.
As I increase the size of the input images, I’ve noticed the GPU memory usage increases as I process batches. It starts small, and increases steadily, until I run out of GPU memory and it crashes.
If I make all batches “supervised”, where the latent y is always observed, the problem doesn’t occur. It only occurs when executing the “unsupervised” loss function.
Does anyone have any idea what is happening?
Reading up on similar PyTorch problems, it sounds like it could be the loss function being held for too long, so memory can’t be released at the end of a batch? (https://discuss.pytorch.org/t/cuda-memory-continuously-increases-when-net-images-called-in-every-iteration/501/5)