# Posterior predictive on a new observation in NumPyro

Hello. I am following tutorial from Bayesian Regression Using NumPyro

in the [11] code block that has

rng_key, rng_key_ = random.split(rng_key)
predictive = Predictive(model, samples_1)
predictions = predictive(rng_key_, marriage=dset.MarriageScaled.values)["obs"]
df = dset.filter(["Location"])
df["Mean Predictions"] = jnp.mean(predictions, axis=0)


I am trying to understand what are we calculating by this predictive function: suppose, instead of using the existing marriage=dset.MarriageScaled.values, we were to predict on a new MarriageRate value x^*, so that, we have

predictions = predictive(rng_key_, marriage=x*)["obs"]


I want to confirm, are we calculating the posterior predictive defined as:

p(x* \mid (x, y)) = \int p(x * \mid \theta)\cdot p(\theta \mid (x, y)) \; d\theta

where (x, y) is the dataset used for fitting the model and obtaining posterior samples of \theta, and these posterior samples are present in samples_1 variable

My question is: How is this integral evaluated internally? To begin with, how is p(x * \mid \theta) determined? This is my understanding so far for the predictive function:

• The tutorial model uses \text{Normal}(\mu, \sigma) likelihood. \mu depends on both x^* and posterior samples of \theta. So, we just plug those in to get the likelihood distribution at x^* and posterior \theta. Now, we can just sample from this distribution get the predictions at x^*. By default, we only sample once per posterior sample draw. So, the resultant is a prediction y^* of the same size as the number of draws in our posterior samples. We can take the mean, and hpdi interval of this to summarize the predictions.

• But I am not sure how this is similar to evaluating the above integral? Do we even evaluate this integral at all? The more I look at it, the more confusing this integral becomes for me. To me, the above step looks like only evaluating the p(x^*\mid \theta) part of the integral. Hence, we are not importance weighing by p(\theta \mid (x,y)) in the above step, ie, we are not accounting for the â€ślikelinessâ€ť of \theta.

I am new to computational Bayesian methods and I would really appreciate any guidance I can get here. This will clarify a lot of things for me (and hopefully for others too). Thanks a lot!

Hi @mathlad, this is a great question. Running predictive will give you a collection of samples (x^*,\theta) (conditioned on the data). Dropping those \theta will give you samples for x^* (again, conditioned on the data). So predictive will give you samples from p(x^*|x,y). If you want to evaluate the probability density of a particular x^*, I think you need to draw many \theta and approximate the integral (using Monte Carlo approximation: E[q(x^*|\theta)] \approx \frac{1}{N} \sum_{\theta \sim q} p(x^* | \theta) ). You can compute such approximation using log-sum-exp like in the Posterior Predictive Density section in that tutorial.

we are not accounting for the â€ślikelinessâ€ť of \theta

\theta samples used to compute predictive density should be drawn from the posterior.

1 Like