Hello. I am following tutorial from Bayesian Regression Using NumPyro
in the [11]
code block that has
rng_key, rng_key_ = random.split(rng_key)
predictive = Predictive(model, samples_1)
predictions = predictive(rng_key_, marriage=dset.MarriageScaled.values)["obs"]
df = dset.filter(["Location"])
df["Mean Predictions"] = jnp.mean(predictions, axis=0)
df.head()
I am trying to understand what are we calculating by this predictive
function: suppose, instead of using the existing marriage=dset.MarriageScaled.values
, we were to predict on a new MarriageRate
value x^*, so that, we have
predictions = predictive(rng_key_, marriage=x*)["obs"]
I want to confirm, are we calculating the posterior predictive defined as:
p(x* \mid (x, y)) = \int p(x * \mid \theta)\cdot p(\theta \mid (x, y)) \; d\theta
where (x, y) is the dataset used for fitting the model and obtaining posterior samples of \theta, and these posterior samples are present in samples_1
variable
My question is: How is this integral evaluated internally? To begin with, how is p(x * \mid \theta) determined? This is my understanding so far for the predictive
function:
-
The tutorial model uses \text{Normal}(\mu, \sigma) likelihood. \mu depends on both x^* and posterior samples of \theta. So, we just plug those in to get the likelihood distribution at x^* and posterior \theta. Now, we can just sample from this distribution get the predictions at x^*. By default, we only sample once per posterior sample draw. So, the resultant is a prediction y^* of the same size as the number of draws in our posterior samples. We can take the mean, and hpdi interval of this to summarize the predictions.
-
But I am not sure how this is similar to evaluating the above integral? Do we even evaluate this integral at all? The more I look at it, the more confusing this integral becomes for me. To me, the above step looks like only evaluating the p(x^*\mid \theta) part of the integral. Hence, we are not importance weighing by p(\theta \mid (x,y)) in the above step, ie, we are not accounting for the “likeliness” of \theta.
I am new to computational Bayesian methods and I would really appreciate any guidance I can get here. This will clarify a lot of things for me (and hopefully for others too). Thanks a lot!