I see the issue - the value reported by Edit: Positive log density should not be a problem per se since you have switched to a continuous distribution. Also see: Large negative training loss in VAE - #5 by rgreen1995svi.step()
is the negative of the ELBO and should be positive (the plotted graph in the tutorial is the actual ELBO, and hence negative). You are right that in general the ELBO should be taking on less negative values. In this case, while the ELBO is increasing, it is becoming more and more positive. That is problematic because KL(q(z|z) || p(z|x))
is positive and ELBO needs to be negative so that the sum is the model log evidence log(p(x))
. I also noticed that using a higher value, say 0.5
, for the observation’s scale makes the model well behaved.
My intention was to compare which model was better using the loss. How can this be done?
While ELBO has been used for model selection in practice (see Beal, 2002), I’m not sure if it is theoretically very well grounded.