Empirical Bayes (useful?)


I have some model p(y|\theta)p(\theta|\nu) where \nu are some hyper parameters and and \theta are some latent unobserved variable. Although I have some idea on how to set the hyperparameters it seems reasonable to take the empirical bayes approach and maximize the quantity:
Which should be possible by SGD on \nu. I think this shouldn’t be too difficult to implement in Pyro but I have not seed in practice and I was wondering if there was any reason why this was the case. It seems to be that using Reinforce should work fine for computing the gradient of this quantity no?

If it is reasonable, do you think it should be done in two steps or should I just stick a pyro.param statement in the model for the hyperparameters and optimize everything at once.

if you make nu a pyro.param and do variational inference you will be maximizing (a lower bound to) the expectation you included in your post