Posterior predictive on a new observation in NumPyro

mathlad · February 19, 2024, 4:12pm

Hello. I am following tutorial from Bayesian Regression Using NumPyro

in the [11] code block that has

rng_key, rng_key_ = random.split(rng_key)
predictive = Predictive(model, samples_1)
predictions = predictive(rng_key_, marriage=dset.MarriageScaled.values)["obs"]
df = dset.filter(["Location"])
df["Mean Predictions"] = jnp.mean(predictions, axis=0)
df.head()

I am trying to understand what are we calculating by this predictive function: suppose, instead of using the existing marriage=dset.MarriageScaled.values, we were to predict on a new MarriageRate value x^*, so that, we have

predictions = predictive(rng_key_, marriage=x*)["obs"]

I want to confirm, are we calculating the posterior predictive defined as:

p(x* \mid (x, y)) = \int p(x * \mid \theta)\cdot p(\theta \mid (x, y)) \; d\theta

where (x, y) is the dataset used for fitting the model and obtaining posterior samples of \theta, and these posterior samples are present in samples_1 variable

My question is: How is this integral evaluated internally? To begin with, how is p(x * \mid \theta) determined? This is my understanding so far for the predictive function:

The tutorial model uses \text{Normal}(\mu, \sigma) likelihood. \mu depends on both x^* and posterior samples of \theta. So, we just plug those in to get the likelihood distribution at x^* and posterior \theta. Now, we can just sample from this distribution get the predictions at x^*. By default, we only sample once per posterior sample draw. So, the resultant is a prediction y^* of the same size as the number of draws in our posterior samples. We can take the mean, and hpdi interval of this to summarize the predictions.
But I am not sure how this is similar to evaluating the above integral? Do we even evaluate this integral at all? The more I look at it, the more confusing this integral becomes for me. To me, the above step looks like only evaluating the p(x^*\mid \theta) part of the integral. Hence, we are not importance weighing by p(\theta \mid (x,y)) in the above step, ie, we are not accounting for the “likeliness” of \theta.

I am new to computational Bayesian methods and I would really appreciate any guidance I can get here. This will clarify a lot of things for me (and hopefully for others too). Thanks a lot!

fehiepsi · February 20, 2024, 4:45pm

Hi @mathlad, this is a great question. Running predictive will give you a collection of samples (x^*,\theta) (conditioned on the data). Dropping those \theta will give you samples for x^* (again, conditioned on the data). So predictive will give you samples from p(x^*|x,y). If you want to evaluate the probability density of a particular x^*, I think you need to draw many \theta and approximate the integral (using Monte Carlo approximation: E[q(x^*|\theta)] \approx \frac{1}{N} \sum_{\theta \sim q} p(x^* | \theta) ). You can compute such approximation using log-sum-exp like in the Posterior Predictive Density section in that tutorial.

we are not accounting for the “likeliness” of \theta

\theta samples used to compute predictive density should be drawn from the posterior.