I wanted to put priors on some of the layers in the network and learn the parameters, pyro allows to do this. However I want to train the network in the following manner, I was wondering if this is possible and if the training strategy has some flaw that I am overlooking (according to me, this might help me train faster)
say for 5 iterations I sample a set of parameters and optimize for
then for the next 2 iterations, I sample a set of parameters and optimize
MSE/ Cross-entropy/ etc
I realize this might be more helpful than the above, can I optimize
Elbo + K*MSE (etc) where K is some hyper -param ?
Any hints will be appreciated.