Separating guide and model parameters in svi.step() function to perform aggressive training of inference network



To avoid latent variable collapse when using auto-regressive decoders, I found this easy to implement idea mentioned in the paper. The idea (See Algorithm 1) is to optimize the inference network more aggressively in the first few epochs and then let the normal VAE training take over.

I believe that we can implement it by separating out guide and model parameters in step function in class at this line.

Any pointers on how i can separate out the guide and model parameters in the step function?



I think, i can separate parameters here:

Do you think its the correct approach?


Sorry, my bad.

Looking at their code:

They are using two different optimizers for encoder and decoder parameters. That would be easy to do in Pyro (