DiagnalNormal Guide vs. Delta Guide

I am keep getting an impression that the diagonal normal guide would perform better than the delta guide for most datasets. Under what circumstances is the usage of Delta Guide preferred over DiagnalNormal Guide?

Thank you,

this is very problem specific. a delta guide doesn’t give you any parameter uncertainty (you just get a point estimate). the AutoDiagonalNormal guide will give you some parameter uncertainty, but it also potentially makes the optimization problem more difficult. so there’s no simple answer here.

Is there any website or other resource that discusses this topic? Thank you,

i don’t know of any good resource for these kinds of things. you could read something like “Pattern Recognition and Machine Learning” by Christopher M. Bishop for an introduction to probabilistic machine learning and try to build intuition that way


Thank you very much for your reply.
I’d hate to keep bug you on this, but is there any publication that is linked to the AutoLaplaceApproximation guide that comes with Pyro?


Thank you,

“Statistical rethinking” textbook has a nice introduction into this (aka quadratic approximation, see e.g. chapter 2) and other basic methods:
(with codes also available in numpyro)

In practice when I am creating new models, I often implement both Delta and AutoNormal guides. The Delta guides tend to converge more quickly and more robustly. Once I can get a Delta guide to train, I’ll switch to an AutoNormal guide with more training steps, lower learning rate. After AutoNormal, I’ll often switch again to an AutoLowRankMultivariateNormal with even slower learning rate and more steps. I find Delta is good for fast model iteration and a good sanity check that I can learn a decent point estimate, before I start modeling uncertainty.

When you implement both the AutoNormal and Delta guides, does the model with a AutoNormal guide usually perform considerably better then the model with Delta guide? This is what is happening to me right now, and I am assuming this is because the Delta guide assigns all probabilities to a single value?

@h56cho it depends what you mean by “better”. Indeed the AutoDelta guide provides simply a single point estimate (corresponding to MAP inference). If you want any sort of uncertainty estimate at all, you’ll need to use something like AutoNormal. If all you want is a point estimate, then AutoDelta can sometimes “perform better” in the sense that it is more robust and converges more quickly.