This question is inspired by one chapter of stan documentations about measurement error model and one chapter of statistical rethinking about missing data.

Given observed response Y, some random variables X\sim p(\theta) as input, and a regression model

E(Y) = link^{-1}(\beta X), where link function could be linear, logit, etc. Is there an efficient way to perform inference on \beta without inference on X, since X are nuisance parameters here?

One chaper of BDA claims one solution is, considering the equation

p(\beta|Y) = \int p(\beta|Y,X)p(X|Y)dX, to draw samples from posterior maginal distribution X|Y, and then get \beta conditioning on X and Y. However, this only works when model is simple and X|Y is easy to get, for example, when X is normally distributed and link is linear, which is not a general case.

Another solution is to regard X as parameters and then get joint posterior p(\beta, X|Y), and then get the marginal posterior distribution of \beta. But I feel this leads to a big cost when the number of samples is huge. Also by definition, the number of parameters increases with the sample size, so it is no longer a parametric model.

One possible solution is to use Monte Carlo to estimate log density related to X using samples from p(\theta), and then apply MCMC or SVI. It sounds like a feasible one, but I am not an expert and donâ€™t know if any one know how to do it in numpyro or have other suggestions.