Hi, Thanks for Pyro team. It is awesome. It might be a research question, but I don’t have clear idea, so I would like to ask your opinions.
Q. What is a practical approach for model selection in variation inference?
Let’s say there is a problem of data={train, test}
and model={m1,m2}
.
For simplicity, m1
is a neural network with hidden layer 10 and m2
is exactly same with m1
except for it has hidden layer 20. Put all same priors for all parameters. The neural network structure and activations are all same.
Here are my approaches. This is not theoretically 100% correct, but we can use these steps in some practical engineering application (please correct me if I am wrong).
[A] In a frequentist way,
- Train
m1
withtrain
and calculate lossloss
(e.g., l2 norm) ontest
. - Compare
loss_m1(test)
andloss_m2(test)
. Choose eitherm1
orm2
showing lowerloss
.
[B] In a Full Bayesian way (assuming same prior distributions),
- MCMC for
m1
andm2
withtrain
. - Posterior predictive check with
m1
andm2
ontest
ortrain
data. - If both acceptable, calculate some metrics such as
loo
WAIC
orBayes factor (evidence)
ontest
ortrain
data (via bridge sampling).
[C] In Variational inference (with Pyro)
- Train
m1
andm2
withtrain
via SVI. - Calculate ELBO via SVI.evaluate_loss() on
test
data form1
andm2
. (But, it gives stochastic ELBO, so we may repeat it several times or have large number fornum_particles
argument). - Choose one that shows lower loss. (because ELBO is approximation of evidence).
I am particularly interested [C]. I think theoretically and practically it is viable because that’s the meaning of ELBO for some practical engineering application.
I may recall Bayesian dropout in VI or Spike/Slab priors in Bayesian, but I would like to know if [C] is acceptable in a simple situation such as two model comparison.