Hi, Thanks for Pyro team. It is awesome. It might be a research question, but I don’t have clear idea, so I would like to ask your opinions.

**Q. What is a practical approach for model selection in variation inference?**

Let’s say there is a problem of `data={train, test}`

and `model={m1,m2}`

.

For simplicity, `m1`

is a neural network with hidden layer 10 and `m2`

is exactly same with `m1`

except for it has hidden layer 20. Put all same priors for all parameters. The neural network structure and activations are all same.

Here are my approaches. This is not theoretically 100% correct, but we can use these steps in some practical engineering application (please correct me if I am wrong).

[A] In a frequentist way,

- Train
`m1`

with`train`

and calculate loss`loss`

(e.g., l2 norm) on`test`

. - Compare
`loss_m1(test)`

and`loss_m2(test)`

. Choose either`m1`

or`m2`

showing lower`loss`

.

[B] In a Full Bayesian way (assuming same prior distributions),

- MCMC for
`m1`

and`m2`

with`train`

. - Posterior predictive check with
`m1`

and`m2`

on`test`

or`train`

data. - If both acceptable, calculate some metrics such as
`loo`

`WAIC`

or`Bayes factor (evidence)`

on`test`

or`train`

data (via bridge sampling).

[C] In Variational inference (with Pyro)

- Train
`m1`

and`m2`

with`train`

via SVI. - Calculate ELBO via SVI.evaluate_loss() on
`test`

data for`m1`

and`m2`

. (But, it gives stochastic ELBO, so we may repeat it several times or have large number for`num_particles`

argument). - Choose one that shows lower loss. (because ELBO is approximation of evidence).

I am particularly interested [C]. I think theoretically and practically it is viable because that’s the meaning of ELBO for some practical engineering application.

I may recall Bayesian dropout in VI or Spike/Slab priors in Bayesian, but I would like to know if [C] is acceptable in a simple situation such as two model comparison.