Overfitting issue when using SVI for inference

SkyWind · October 16, 2022, 6:17am

Hi there,

I’m using NumPyro’s SVI for multinomial logit model inference and I’m working with the auto-guide utility AutoNormal() to generate guide.

When I’m evaluating performance of fitted model, I found that there is a problem of overfitting with SVI result – under a holdout setting of 80% data for training and 20% data for validation, I can get a training hit rate (i.e., accuracy) of 92.57% while the validation hit rate is only 42.00%.

At the same time, I have a R counterpart, which has an exactly same setting, based on bayesm using a MCMC chain with random walk MH sampling method and I can get 72.05% training hit rate and 59.51% validation hit rate – the generalization property is much better.

Currently I’m using SVI mainly for speed consideration – It took me much time to train a full model with R or Numpyro’s MCMC samplers. But I also want a robust performance and the observed overfitting phenonemon with SVI is not desired.

Is there any suggestion for dealing with this kind of overfitting with SVI? Or if there is another potential way in Numpyro to get robust result efficiently?
Many thanks for any feedback!

I can add additional informantion about my usecase if needed.

martinjankowiak · October 16, 2022, 6:07pm

hard to make any concrete suggestions without more information about your particular use case (number of data points etc)

SkyWind · October 17, 2022, 8:13am

Hi Martin, Thank you for your response.

I will do some further investigation first and then update more infos here.