Hi,
I’m trying to learn how to write my own guide()
in the case of hiearchical models. Guides for non-hierarchical models, e.g. the Bayes Regression tutorial makes sense to me, but I am having trouble translating that to a hierarchical setting. Information on how to write a guide for hiearchical models seem sparse, though I did look through the eight schools example.
To learn, I’m trying to reproduce the example shown here: https://twiecki.io/blog/2014/03/17/bayesian-glms-3/
which in essence is trying to predict the radon level in homes based on whether or not a house has a basement. Data originates from different counties and thus it makes sense to learn a hierarchical setting where intercept and slope can vary based on county. I.e.: y_i = α_j[i] + β_j[i]*x_i + ϵ_i, for county j, observation i.
Writing the model (prior) and using MCMC to fit, seems to work quite well and it clearly learns different alphas and betas for each county. I wrote my model as follows:
def model(self, X: pd.DataFrame, y: pd.Series):
floor = torch.from_numpy(X['floor'].to_numpy(dtype=np.float32))
county_idx = X['county_code'].values
num_counties = len(X['county'].unique())
log_radon = torch.tensor(y, dtype=torch.float)
# global hyperprior
mu_a = pyro.sample('mu_a', dist.Normal(0., 1.))
sigma_a = pyro.sample('sigma_a', dist.HalfCauchy(scale=1.))
mu_b = pyro.sample('mu_b', dist.Normal(0., 1.))
sigma_b = pyro.sample('sigma_b', dist.HalfCauchy(scale=1.))
# conditional independence between each county
# alpha, beta will have shape num_counties
with pyro.plate('counties', num_counties):
alpha = pyro.sample('alpha', dist.Normal(mu_a, sigma_a))
beta = pyro.sample('beta', dist.Normal(mu_b, sigma_b))
mean = alpha[county_idx] + beta[county_idx] * floor
eps = pyro.sample('eps', dist.HalfCauchy(1.0))
# conditional independence between each observation
with pyro.plate('data', floor.shape[0]):
obs = pyro.sample('obs', dist.Normal(mean, eps), obs=log_radon)
return obs
The problem arises when I try to use SVI and write my own guide (approximate posterior). I’m unsure if I wrote the guide correctly and was hoping someone could tell me what I’m doing wrong. It seems to converge solely towards an overall group disitribution. I suspect I need to specify more pyro.param
’s since I only have pyro.param
’s for the hyperprior distribution. But I’m unsure whether or not pyro does this implicitly?
My guide looks as follows:
def guide(self, X: pd.DataFrame):
floor = torch.from_numpy(X['floor'].to_numpy(dtype=np.float32))
county_idx = X['county_code'].to_numpy()
num_counties = len(X['county'].unique())
# Learnable parameters for hyperpriors
# mean and sd for mean of alpha
mu_mu_a = pyro.param('mu_mu_a', torch.randn(1))
sigma_mu_a = pyro.param('sigma_mu_a', torch.ones(1), constraint=constraints.positive)
# scale for sd of alpha
sigma_sigma_a = pyro.param('sigma_sigma_a', torch.ones(1), constraint=constraints.positive)
# mean and sd for mean of beta
mu_mu_b = pyro.param('mu_mu_b', torch.randn(1))
sigma_mu_b = pyro.param('sigma_mu_b', torch.ones(1), constraint=constraints.positive)
# scale for sd of beta
sigma_sigma_b = pyro.param('sigma_sigma_b', torch.ones(1), constraint=constraints.positive)
# learnable for the noise
eps_scale = pyro.param('eps_scale', torch.ones(1), constraint=constraints.positive)
# sample
mu_a = pyro.sample('mu_a', dist.Normal(mu_mu_a, sigma_mu_a))
sigma_a = pyro.sample('sigma_a', dist.HalfCauchy(sigma_sigma_a))
mu_b = pyro.sample('mu_b', dist.Normal(mu_mu_b, sigma_mu_b))
sigma_b = pyro.sample('sigma_b', dist.HalfCauchy(sigma_sigma_b))
with pyro.plate('counties', num_counties):
alpha = pyro.sample('alpha', dist.Normal(mu_a, sigma_a))
beta = pyro.sample('beta', dist.Normal(mu_b, sigma_b))
mean = alpha[county_idx] + beta[county_idx] * floor
eps = pyro.sample('eps', dist.HalfCauchy(eps_scale))
return mean
I will then fit using SVI:
optimizer = optim.Adam({"lr": 1e-2})
svi = SVI(model, guide, optimizer, loss=Trace_Elbo())
for epoch in range(epochs):
svi.step(X, y)
Any feedback on this would be great.