Moving from MCMC to SVI

Hi all,

I’m looking to try out SVI my model. I come from the land of MCMC where we abide by “just use NUTS”, but there seem to be so many choices for SVI. I’ve looked around the forum for advice, but I still have some simple questions:

  • I’d like to use an AutoGuide (probably MVN or a normalizing flow). I’ve written custom guides in the past and I had to add my own constraints e.g. constraints.positive for sd parameters. Do AutoGuides automatically add the constraints by looking at the model priors or does the user have to specify these somehow?
  • For the NUTS version of my model, I used LocScaleReparam to use a non-centred parametrisation for random effects. Is this necessary/beneficial for SVI?
  • Is there a go-to (like “just use NUTS”) SVI strategy (everybody’s favourite ELBO + everybody’s favourite autoguide + everybody’s favourite learning rate) that just works or is it a case of trying many things out for a particular model?
  • I have access to 2 GPUs. I assume it’s almost always beneficial to run SVI on GPU. Do I just use `numpyro.set_platform(“gpu”)? Will this efficiently make use of both GPUs?

Cheers,

Theo

these are automatic

they can be, yes. in pyro there is AutoReparam, although i don’t think this exists in numpyro

not really. adam with an initial learning rate of 0.001 + vanilla ELBO + AutoNormal is usually a good place to start. please also refer to our tips and tricks for additional advice.

yes. no.

btw i expect it will be difficult to get good results with SVI because your random walk components are presumably highly correlated in the posterior and highly correlated posteriors can be difficult in the SVI context, especially in the high-dimensional regime

Is there a way of using both GPUs or is it best (and easier) just to use one?

Is there a good inference method (VI or otherwise) for correlated problems in numpyro? I know it’s a tricky problem for MCMC, but spatial/temporal problems are unavoidably correlated and very common

using 2 GPUs is probably possible but i’m not exactly sure how to do it and at most it’ll give you a factor of 2 speed-up so is it really worth the trouble?

it’s hard to say you might try AutoDais

1 Like

@theo , a couple things. First, it’s not true that the “default” routine for MCMC is (or should be) NUTS – though NUTS is a great all-around routine for problems for which a joint distribution consists of only continuous rvs and is differentiable, it’s obviously not applicable for other problems. Pyro has very clever built-in discrete exact inference machinery that can make NUTS work with a restricted class of discrete latents, but even that doesn’t cover some common cases of unbounded discrete latent rvs. For those use cases – which are very common in some fields of study – one must be more creative with one’s MCMC algorithm (e.g., use some form of reversible jump MCMC) or use a lazy inference method (e.g., lazy factored inference). I am writing this because I think it’s really important information that could be useful to you in the future. Second, and more to your applied point, in Pyro there are ways to reduce the deleterious effects of correlated (spatially or temporally) rvs in your model. You should check out the DCT (Reparameterizers — Pyro documentation) and Haar (Reparameterizers — Pyro documentation) reparameterizers. However, these exist only in Pyro (not numpyro) so you may want to explore the viability of converting your model to pyro.

Of course. NUTS is a fair choice for this model as it is latent continuous, although HMCGibbs sampling is a good idea too as there is some conjugacy to exploit with the nested Gaussians.

Thanks for the heads up on those reparameterisers. I’ll see how the LocScale reparam gets on and then check those out – I can probabaly use pyro but if not, I can rewrite it for numpyro.