Large variance in SVI losses to be expected?

I’m learning how to use the NumPyro SVI API. I’ve followed the Pyro tutorial (I couldn’t find a NumPyro one) and have been trying to fit a simple linear regression example using SVI. I seem to get a good parameter fit, however I was a bit surprised by the variance on the losses:

I’m fairly new to using SVI. I expected some variance on the losses, however I was surprised by how large they were. Is this huge variance to be expected, or if I’m doing something wrong?

My full notebook with outputs can be found here

yes the variance can be quite large if you’re using a single monte carlo sample.

see here for tips and tricks for doing SVI successfully

@martinjankowiak I’m assuming I need to be using the num_particles parameter of the ELBO objective, as is documented in “8. Explore trade-offs controlled by num_particles , mini-batch size, etc.”?

I tried this via TraceMeanField_ELBO(num_particles=10) and did not see any reduction in variance:

What’s more, the variance even seems to go up if I increase the number of samples.

I tried most other tricks from “SVI Part IV: Tips and Tricks”, seemingly to no avail.

Apparently it’s a bad idea to use dist.Exponential as a guide. I switched to dist.LogNormal as the guide for my noise parameter (the noise parameter in the model is still an Exponential) with a much better loss variance, and fit:

I was wondering, is this because the Exponential distribution is Non-reparameterizable? Or is there something else going on?
Is there anywhere to check which NumPyro distributions are non-reparameterizable?

See full new Notebook

just because you don’t see a visual reduction in variance doesn’t mean there wasn’t a reduction in gradient variance (and therefore probably also) loss variance

most continuous distributions, including the exponential, are reparameterizable.

i’m guessing you need to do this

I tried to initialize the guide distributions to have low variance (Normal std=0.01 and lower, Exponential rate=100 and higher) and this does not seem to alleviate the huge loss variance.

However, using LogNormal as a guide does seem to solve the issue (as I mentioned in my previous comment)

the exponential distribution is a terrible choice because it links the mean to the variance: it’s simply too inflexible

1 Like