Hierarchical models and eight schools example

Hi. I’m trying to understand hierarchical models, and their Pyro implementation, starting with the eight schools example in the github repository (the SVI one).

At first I was confused with the formulation of the model, until I realised it was a “non-centred” implementation, as described here. Now, I’m trying to make a “centred” version, which should be easy, but is really confusing me.

This is the original “non-centred version”:

def model(data):
    y = data[:, 0]
    sigma = data[:, 1]
    eta = pyro.sample('eta', dist.Normal(torch.zeros(J), torch.ones(J)))
    mu = pyro.sample('mu', dist.Normal(torch.zeros(1), 10 * torch.ones(1)))
    tau = pyro.sample('tau', dist.HalfCauchy(scale=25 * torch.ones(1)))

    theta = mu + tau * eta

    pyro.sample("obs", dist.Normal(theta, sigma), obs=y)

And this is my “centred” model:

def model_centred(data):
    y = data[:, 0]
    sigma = data[:, 1]

    # eta = pyro.sample('eta', dist.Normal(torch.zeros(J), torch.ones(J)))
    mu = pyro.sample('mu', dist.Normal(torch.zeros(J), 10 * torch.ones(J)))
    tau = pyro.sample('tau', dist.HalfCauchy(scale=25 * torch.ones(J)))

    # theta = mu + tau * eta
    theta = pyro.sample('theta', dist.Normal(mu, tau))

    pyro.sample("obs", dist.Normal(theta, sigma), obs=y)

I’ve basically just removed eta, sampling theta from a Normal instead, and changed the dimensions of mu and tau to J. In the guide, I removed eta, and changed the dimensions of mu and tau to J.

Do these changes make sense? Are there other changes I should make?
The model runs ok, but I’m not sure how to interpret the output.

It seems to me that both these are using a non-centered parameterization, because dist.Normal(mu, tau).sample() would call the .rsample() method in the backend which uses the reparameterized (i.e. non-centered) sampler. To do a centered parameterization, you could try the following so that pyro.sample calls the distribution’s sample method instead of the rsample method. I would suggest plotting the posterior to see that you are indeed getting expected results:

d = dist.Normal(mu, tau)
d.has_rsample = False
theta = pyro.sample('theta', d)

Ok, thanks. For plotting the posterior, I’m planning to have another function that is very similar to the model function, but using fixed parameters set to the values learned by the guide. Is there a better way to do it?

I’m also a little confused by this vectorised sampling:

eta = pyro.sample('eta', dist.Normal(torch.zeros(J), torch.ones(J)))

Is this interpreted as being “J samples from a single N(0,1) distribution”, or “J samples, each from a different N(0,1) distribution”. I know the samples will be the same, but will Pyro’s inference mechanism treat this as being a single distribution, or J distributions?

You can take a look at this example (not merged yet) to see how it is plotting the posterior distribution.

Is this interpreted as being “J samples from a single N(0,1) distribution”, or “J samples, each from a different N(0,1) distribution”. I know the samples will be the same, but will Pyro’s inference mechanism treat this as being a single distribution, or J distributions?

You can interpret it as a batch of J values from N(0, I). You can also write this as dist.Normal(0., 1.).expand_by([J]). Refer to the tensor shapes tutorial for more details. As a side-note, HMC will not make use of conditional independence information in the model. If you are using this inside of SVI, you should wrap this in an iarange of size J (I just noticed that this example does not do so), which provides additional information that can be exploited when constructing the ELBo estimator.