Hi, Thanks for Pyro team. It is awesome. It might be a research question, but I don’t have clear idea, so I would like to ask your opinions.
Q. What is a practical approach for model selection in variation inference?
Let’s say there is a problem of data={train, test} and model={m1,m2}.
For simplicity, m1 is a neural network with hidden layer 10 and m2 is exactly same with m1 except for it has hidden layer 20. Put all same priors for all parameters. The neural network structure and activations are all same.
Here are my approaches. This is not theoretically 100% correct, but we can use these steps in some practical engineering application (please correct me if I am wrong).
[A] In a frequentist way,
- Train
m1withtrainand calculate lossloss(e.g., l2 norm) ontest. - Compare
loss_m1(test)andloss_m2(test). Choose eitherm1orm2showing lowerloss.
[B] In a Full Bayesian way (assuming same prior distributions),
- MCMC for
m1andm2withtrain. - Posterior predictive check with
m1andm2ontestortraindata. - If both acceptable, calculate some metrics such as
looWAICorBayes factor (evidence)ontestortraindata (via bridge sampling).
[C] In Variational inference (with Pyro)
- Train
m1andm2withtrainvia SVI. - Calculate ELBO via SVI.evaluate_loss() on
testdata form1andm2. (But, it gives stochastic ELBO, so we may repeat it several times or have large number fornum_particlesargument). - Choose one that shows lower loss. (because ELBO is approximation of evidence).
I am particularly interested [C]. I think theoretically and practically it is viable because that’s the meaning of ELBO for some practical engineering application.
I may recall Bayesian dropout in VI or Spike/Slab priors in Bayesian, but I would like to know if [C] is acceptable in a simple situation such as two model comparison.