I’m using NumPyro’s SVI for multinomial logit model inference and I’m working with the auto-guide utility AutoNormal() to generate guide.
When I’m evaluating performance of fitted model, I found that there is a problem of overfitting with SVI result – under a holdout setting of 80% data for training and 20% data for validation, I can get a training hit rate (i.e., accuracy) of 92.57% while the validation hit rate is only 42.00%.
At the same time, I have a R counterpart, which has an exactly same setting, based on bayesm using a MCMC chain with random walk MH sampling method and I can get 72.05% training hit rate and 59.51% validation hit rate – the generalization property is much better.
Currently I’m using SVI mainly for speed consideration – It took me much time to train a full model with R or Numpyro’s MCMC samplers. But I also want a robust performance and the observed overfitting phenonemon with SVI is not desired.
Is there any suggestion for dealing with this kind of overfitting with SVI? Or if there is another potential way in Numpyro to get robust result efficiently?
Many thanks for any feedback!
I can add additional informantion about my usecase if needed.