I am currently implementing this model [1110.4411] Gaussian Process Regression Networks
It is basically something akin to Gaussian process with a latent linear regression. My model is currently implemented and works well. However I am having some issues with choosing the hyper-parameters. I tried using the standard formulation where we just use a a
.param statement in the model and this gets optimized with the the elbo. However that didn’t work very well as it is super sensitive to initialization. Is there any way in Pyro to just maximize the marginal as a prior step and the start doing inference on the posterior?
do you mean kernel hyperparameters or something else?
Just the Kernel Hyper-parameters of the W and f and the noise.
not really sure what issue you might be facing. i trained one of these a few years ago and don’t recall having any problems or needing to do anything in unusual to get it to work. are you doing mini-batch training?
not sure what this means. the elbo is a lower bound to the log evidence, the computation of which requires a variational approximation.
Thank you for answering so quickly! I apologize; I think I didn’t formulate my question correctly.
I am not doing minibatch training.
The model is currently working, but unfortunately, it seems that the hyper-parameters that it finds are not great, and the model is super sensitive to the initial value that I set for them. Currently, I am trying only to optimize for the noise of the model (the sigma_y in the paper) and the length-scale used for the W kernel. To do the optimization, I am following the standard approach of SVI maximizing the ELBO with the hope that the parameters that are learned in the model and the guide are good. However, I have seen that this approach seems to give me some trouble, as depending on what I initially choose, the fit is either good or quite bad. I know that it is normal for the model to be sensitive to initial values, but it seems to me, but maybe I am wrong, that if there were a way for first optimizing hyper-parameters in some principled way without relying on the variational approximation and then keeping them fixed, perhaps the results would be a bit better and somewhat less sensitive. With the current settings, it seems that the model gets stuck at a local optimum sometimes. Is there a way to do something like this?
Alternatively, are there any better ways for choosing hyper-parameters in this scenarios other than optimizing them with the Elbo or something super expensive like cross validation. Thank you again!
i suggest doing mini-batch training. when you do batch training (i.e. all data enters into each gradient step) there tends to be a much larger tendency to get stuck in bad local optima.
this should be fine if you take care to do optimization carefully (reasonable learning rate, etc)
This makes a lot of sense. Thank you, I will try it!