How long should you run SVI for?

Hi everyone,

I’ve built a multi-layered hierarchical Bayesian model on numpyro to estimate ~1M parameters using ~10M observations. I estimate the parameters using SVI.

The key hierarchical parameters (A) is built from a global → big category → smaller category → unit (i) level, and there are ~200k units.

After a certain number of runs, the losses stabilize at a number (let’s call it X), however, I can see that each unit-level parameter is still changing. For example, at 100k iterations, A_i = -1, then at 200k iterations, A_i = -1.5, and at 500k iterations, A_i = -2.5 even though the loss stays around X.

I know that I should evaluate the model on some testing set, however, prediction isn’t my goal. I’m trying to get a causal estimate of A_i for each unit, therefore I’m not sure if the ‘best prediction’ parameters are the correct parameters to recover.

Are there any criteria that you guys use to determine the stopping point?

Thanks!

are you progressively lowering the learning rate? that’s generally important.

monitoring model/variational parameter changes makes sense.

are you doing mini-batch training? you may want to increase the batch size as you get closer to convergence.