I’m doing some experiments with different step sizes in SVI for different parameters, a la: Example: Mixing Optimizers
I’m characterizing three latents (three pyro.sample latents in my guide), two are have normal distributions in the model and guide, and another has a ProjectedNormal in the model, and a mixture of ProjectedNormals in the guide.
I have an intuition that the log_probs from the ProjectedNormal sampled variable are on a different order, or have higher variance, than the Normals. And that changing the step size could somehow match them.
Has anything been written about this in general (blog posts, textbook chapters)? I’m not sure exactly how to think about the issue, even what key phrases to google, and or to check in my code and results for healthy training.
I do have some ideas
- check the ELBO drops the same relative amount after taking a step in each parameter
- see if the gradients are on the same order
I think what I want is that one latent does not change much faster than another, and get locked in a local minima, and unable to jump out. I sort of need the each latent to gently shift to the right area, and not make each other compensate for the inaccuracies of each other.