# Bayesian Hierarchical Linear Regression: how to predict on a new patient having observations?

Hello there

I would like to understand how to extend the mentioned tutorial with NumPyro so that:

• having a new (= not part of training) patient, with some observed data,
• we infer his “random effects” (i.e. posterior distribution of (\alpha, \beta) - or only offsets (\alpha - \mu_\alpha, \beta - \mu_\beta)), while retaining “fixed effects” of the once trained model (i.e. posterior distribution of (\mu_\alpha, \mu_\beta, \sigma_\alpha, \sigma_\beta, \sigma))
• and then eventually predict his outcome (FVC_{est}) at unknown weeks

I precise that:

• I’d be in a setup where I could not re-train a model with union of all data (old training data + new data)
• This new patient having observations, I do want to infer his random effects, not considering them as centered on fixed effects (as we could do for a new patient without any observations)

Related material found:

Hi @exhumea, when you perform Bayesian inference and get posterior, you can use that posterior (rather than prior) for new inference with new data.

Hi @fehiepsi! Thanks, so you’d estimate univariate / multivariate joint posterior distribution of “fixed effects” and set them as new priors for inference on new patients? But then is there a way somehow for that these “fixed effects” not to be updated during this new inference?

To fix values of a site, you can use block + condition handlers.

set them as new priors for inference on new patients

Yes, I think it is a reasonable approach.

I may have misunderstood the use of (block +) condition but doesn’t it enable to replace some of the model latent variables priors by point estimates only?

For instance, if after first inference, posterior distribution of \mu_\alpha seems to be well modeled by \mathcal{N}(\hat{\mu}_{\mu_\alpha}, \hat{\sigma}_{\mu_\alpha}) how could I sample this variable from this distribution in my latter inferences, keeping it not updated / “blocked”?

Thanks!

Maybe there’s a more elegant way of doing it but my code usually looks like the code below. Basically, I have a custom predict function that handles the logic for unseen patients with a try-except-else. To decide if the patient is “seen” or “new” I use sklearn’s LabelEncoder(). The example below is minimal, only controlling for age, but you can easily extend it to control for sex or smoking_status.

# not showing import statements nor the code for loading the data

encoder = LabelEncoder()
encoder.fit(train["Patient"])

N_PATIENTS = len(encoder.classes_)

patient_to_age = train[['Patient', 'Age']] \
.drop_duplicates() \
.set_index("Patient") \
.loc[encoder.classes_, "Age"] \
.values

data_dict=dict(
PatientNum = jnp.array(encoder.transform(train['Patient'].values)),
Weeks = jnp.array(train['Weeks'].values),
PatientToAge = jnp.array(patient_to_age),
FVC_obs = jnp.array(train['FVC'].values),
)

def model(PatientNum, Weeks, PatientToAge, FVC_obs=None):
μ_α = numpyro.sample("μ_α", dist.Normal(0., 100.))
σ_α = numpyro.sample("σ_α", dist.HalfNormal(100.))
μ_β = numpyro.sample("μ_β", dist.Normal(0., 100.))
σ_β = numpyro.sample("σ_β", dist.HalfNormal(100.))

# Age's effect on α
coef_age= numpyro.sample("coef_age", dist.Normal(0., 100.,))

with numpyro.plate("plate_i", N_PATIENTS) as patient_idx:
α = numpyro.sample("α", dist.Normal(μ_α + coef_age * PatientToAge[patient_idx], σ_α))
β = numpyro.sample("β", dist.Normal(μ_β, σ_β))

σ = numpyro.sample("σ", dist.HalfNormal(100.))
FVC_est = α[PatientNum] + β[PatientNum] * Weeks

n_obs = PatientNum.shape[0]
with numpyro.plate("data", n_obs):
numpyro.sample("obs", dist.Normal(FVC_est, σ), obs=FVC_obs)

# run mcmc
nuts_kernel = NUTS(model)
mcmc = MCMC(nuts_kernel, num_samples=2000, num_warmup=2000)
rng_key = random.PRNGKey(0)

mcmc.run(rng_key, **data_dict)

# custom predict function
def predict_single_patient(patient_id, weeks, age, posterior_samples, patient_encoder):
# logic to handle unseen patients:
try:
n = patient_encoder.transform([patient_id])[0]
except ValueError:
# new patient but known age
μ_α = posterior_samples["μ_α"]
σ_α = posterior_samples["σ_α"]
α = dist.Normal(μ_α + age * posterior_samples["coef_age"], σ_α) \
.sample(random.PRNGKey(0))

β = dist.Normal(posterior_samples["μ_β"], posterior_samples["σ_β"]) \
.sample(random.PRNGKey(0))

else:
α = posterior_samples["α"][:, n]
β = posterior_samples["β"][:, n]

mu = α[:, None] + β[:, None] * weeks
# note: I'm not including the noise term σ in the prediction here
# but you could easily do that
return mu

weeks = jnp.arange(-12, 133)

# try on seen patient
patient = "ID00007637202177411956430"
age = train.query("Patient == @patient")["Age"].iloc[0]
mu_pred = predict_single_patient(patient, weeks, age, mcmc.get_samples(), encoder)

# try on unseen patient but with known age
patient = "this is a random id"
age = 66
mu_pred = predict_single_patient(patient, weeks, age, mcmc.get_samples(), encoder)


Thank you @omarfsosa.
The difference in my setup is that new patients do have observed data that I want to take into account (i.e. to infer their random effects, the fixed effects being somehow excluded from new inference) so I don’t want to just return a population-level prediction.

Thanks to @fehiepsi I now understood:

• how to force my fixed effects to some point estimates (e.g. mean or mode from training inference posterior distribution) for following inferences on new patients with block + condition
• that I could replace my initial priors on fixed effects (= used during training inference) with posterior distributions obtained after training inference:
• but in turn I still don’t understand how I could freeze these sites / exclude them from next inferences so that only random effects are inferred

I see. So perhaps you want to use something like numpyro.handlers.do and create a sort of “intervened” model? I’m not convinced there’s an easy way of achieving what you’re trying to do here without writing a new model altogether. I think that strictly speaking the data on the new patient will affect the posterior distribution of the rest of the population (only a little cause you have a lot of patients, but still) so inferences should all be considered simultaneously. Inferring the effects of the new patient only feels like a different model to me. Definitely interested in hearing if you find a solution to this

I guess you want to perform inference for p(y) (y is a random effect) given the density p(x, y) (x is a fixed effect), by marginalizing out the x variable. I’m not sure what is an easy way to perform such marginalization (maybe using funsor here? I don’t know). One way is to approximate it by \sum_{i=0}^{n-1} p(x_i)p(y|x_i) - you’ll need to create an auxiliary categorical variable over {0,..., n-1} (with probs parameter is equal to p(x_i)), some sort of:

def model():
c = sample("c", Categorical(p_x))
x_i = x[c]
# use the original modeling code here
# with x=x_i rather than a sample site
sample("y", p(y|x_i))


If you assume that x and y are independent (mean-field approximation). You can use HMCGibbs wherein the gibbs_fn, you can just simply draw a sample for your fixed effects.

Disclaim: Please justify the above suggestions both theoretically and empirically.