My point isn’t that marginalizing local variables is bad, it’s that attempting to do so by importance sampling from the prior will not be very effective except in particularly simple cases. If you can do the integrals exactly using e.g. conjugacy (as in my BetaBinomial suggestion) or enumeration (as in the case of infer_discrete or my quadrature suggestion) then you should. Otherwise, the best approach depends on your problem; one strategy is amortized variational inference in which the parameters of an approximate posterior distributions for local latent variables are functions of the data.
1 Like