This problem applies to both the VAE tutorial and to my own model, so I'm filing it under Tutorials.
The tutorial (and my model) work okay when I use the Adam optimizer, but when I try to use SGD they quickly run into NaNs in the
log_prob calculation. In the case of SGD this seems to happen essentially upon initialization, which makes me think that something about how it gets started is incompatible with the structure of the VAEs.
I don't understand why SGD would be incompatible with these models, but I haven't dug too deeply into potential differences in how they two are running. When I set
validate_args=True I immediately run into some errors but that happens with Adam as well so it doesn't seem like the culprit.