I wanted to put priors on some of the layers in the network and learn the parameters, pyro allows to do this. However I want to train the network in the following manner, I was wondering if this is possible and if the training strategy has some flaw that I am overlooking (according to me, this might help me train faster)
say for 5 iterations I sample a set of parameters and optimize for Elbo using guide and model functions
then for the next 2 iterations, I sample a set of parameters and optimize MSE/ Cross-entropy/ etc
edit:
I realize this might be more helpful than the above, can I optimize Elbo + K*MSE (etc) where K is some hyper -param ?
It seems what you want to optimize is the elbo of a Bayessian network with loss function loss + K*MSE, where loss is the loss function of your original network.
Thus it should be ok if loss + K*MSE makes sense for an ordinary network.
the easiest way to do this would be to add an additional observe statement using a normal probability distribution in your model (note that the log prob of a normal distribution gives you a quadratic term; the constant K would then effectively be determined by the variance of the normal distribution)