SVI and guide for Hierarchical model - Radon example

PeteAlps · June 24, 2020, 9:37am

Hi,

I’m trying to learn how to write my own guide() in the case of hiearchical models. Guides for non-hierarchical models, e.g. the Bayes Regression tutorial makes sense to me, but I am having trouble translating that to a hierarchical setting. Information on how to write a guide for hiearchical models seem sparse, though I did look through the eight schools example.

To learn, I’m trying to reproduce the example shown here: https://twiecki.io/blog/2014/03/17/bayesian-glms-3/
which in essence is trying to predict the radon level in homes based on whether or not a house has a basement. Data originates from different counties and thus it makes sense to learn a hierarchical setting where intercept and slope can vary based on county. I.e.: y_i = α_j[i] + β_j[i]*x_i + ϵ_i, for county j, observation i.

Writing the model (prior) and using MCMC to fit, seems to work quite well and it clearly learns different alphas and betas for each county. I wrote my model as follows:

    def model(self, X: pd.DataFrame, y: pd.Series):

        floor = torch.from_numpy(X['floor'].to_numpy(dtype=np.float32))
        county_idx = X['county_code'].values
        num_counties = len(X['county'].unique())

        log_radon = torch.tensor(y, dtype=torch.float)

        # global hyperprior
        mu_a = pyro.sample('mu_a', dist.Normal(0., 1.))
        sigma_a = pyro.sample('sigma_a', dist.HalfCauchy(scale=1.))
        mu_b = pyro.sample('mu_b', dist.Normal(0., 1.))
        sigma_b = pyro.sample('sigma_b', dist.HalfCauchy(scale=1.))

        # conditional independence between each county
        # alpha, beta will have shape num_counties
        with pyro.plate('counties', num_counties):
            alpha = pyro.sample('alpha', dist.Normal(mu_a, sigma_a))
            beta = pyro.sample('beta', dist.Normal(mu_b, sigma_b))

        mean = alpha[county_idx] + beta[county_idx] * floor
        eps = pyro.sample('eps', dist.HalfCauchy(1.0))

        # conditional independence between each observation
        with pyro.plate('data', floor.shape[0]):
            obs = pyro.sample('obs', dist.Normal(mean, eps), obs=log_radon)

        return obs

The problem arises when I try to use SVI and write my own guide (approximate posterior). I’m unsure if I wrote the guide correctly and was hoping someone could tell me what I’m doing wrong. It seems to converge solely towards an overall group disitribution. I suspect I need to specify more pyro.param’s since I only have pyro.param’s for the hyperprior distribution. But I’m unsure whether or not pyro does this implicitly?

My guide looks as follows:

    def guide(self, X: pd.DataFrame):

        floor = torch.from_numpy(X['floor'].to_numpy(dtype=np.float32))
        county_idx = X['county_code'].to_numpy()
        num_counties = len(X['county'].unique())

        # Learnable parameters for hyperpriors
        # mean and sd for mean of alpha
        mu_mu_a = pyro.param('mu_mu_a', torch.randn(1))
        sigma_mu_a = pyro.param('sigma_mu_a', torch.ones(1), constraint=constraints.positive)
        # scale for sd of alpha
        sigma_sigma_a = pyro.param('sigma_sigma_a', torch.ones(1), constraint=constraints.positive)

        # mean and sd for mean of beta
        mu_mu_b = pyro.param('mu_mu_b', torch.randn(1))
        sigma_mu_b = pyro.param('sigma_mu_b', torch.ones(1), constraint=constraints.positive)
        # scale for sd of beta
        sigma_sigma_b = pyro.param('sigma_sigma_b', torch.ones(1), constraint=constraints.positive)

        # learnable for the noise
        eps_scale = pyro.param('eps_scale', torch.ones(1), constraint=constraints.positive)

        # sample
        mu_a = pyro.sample('mu_a', dist.Normal(mu_mu_a, sigma_mu_a))
        sigma_a = pyro.sample('sigma_a', dist.HalfCauchy(sigma_sigma_a))

        mu_b = pyro.sample('mu_b', dist.Normal(mu_mu_b, sigma_mu_b))
        sigma_b = pyro.sample('sigma_b', dist.HalfCauchy(sigma_sigma_b))

        with pyro.plate('counties', num_counties):
            alpha = pyro.sample('alpha', dist.Normal(mu_a, sigma_a))
            beta = pyro.sample('beta', dist.Normal(mu_b, sigma_b))

        mean = alpha[county_idx] + beta[county_idx] * floor

        eps = pyro.sample('eps', dist.HalfCauchy(eps_scale))

        return mean

I will then fit using SVI:

optimizer = optim.Adam({"lr": 1e-2})
svi = SVI(model, guide, optimizer, loss=Trace_Elbo())
        for epoch in range(epochs):
        svi.step(X, y)

Any feedback on this would be great.

narasimha · June 28, 2020, 1:39pm

Yes, I too am looking for a neat example for hierarchical coefficients aka random coefficient model (Ideally a few fixed and a few random, or atleast all random). It is pretty straightforward to specify in MCMC3 package. It’d be great if pyro has an example. Many industrial applications…IMHO…