MCMC On Poisson Lognormal Model

Lecaillou · May 30, 2019, 7:13pm

Hi all,

I am trying to compute the posterior p(z|x) for this very simple model, and for some reason I do not understand what I am doing wrong (I am discovering this package). Generating data from this model seems to be working, MCMC sampling won’t work.

Here is my model:

def pyro_mdl(data):
#     with pyro.plate('data', len(data)):
    cell_type = pyro.sample('cell_type', dist.Categorical(dataset.probas))
    z = pyro.sample('z', pyro.distributions.MultivariateNormal(dataset.mus[cell_type], 
                                                               dataset.sigmas[cell_type]))
    exp_z = z.exp()
    x = pyro.sample('x', pyro.distributions.Poisson(rate=exp_z), obs=data)
    return x

The parameters have already been fitted (I am mostly interested in MCMC right now). When I try to perform MCMC using:

kernel = HMC(pyro_mdl, adapt_step_size=True, jit_compile=True, )
mcmc_run = MCMC(kernel, num_samples=500, warmup_steps=300).run(data=x)

I get the following error! “TypeError: only integer tensors of a single element can be converted to an index”

Do you have an idea what I have been doing wrong?

neerajprad · June 1, 2019, 1:52am

Could you paste the full stack trace? Also, note that HMC requires continuous parameters in the model, and will therefore try to integrate over all possible values in cell_type.

Lecaillou · July 5, 2019, 11:13pm

Hey @neerajprad

I am so sorry to dig up this thread. I think (or at least I thought) that everything worked fine with my MCMC protocol, but I am not so sure anymore. Let me show you:

# global parameters

# mus: tensor (n_components, n_latent) ==> constant
# sigmas: tensor (n_components, n_latent, n_latent)  ==> constant
# probas: tensor (n_components) ==> constant

def compute_rate(a_mat: torch.Tensor, b: torch.Tensor, z: torch.Tensor):
    n_latent = a_mat.shape[1]
    rate = (a_mat @ z.reshape(-1, n_latent, 1) + b.reshape(-1, 1)).squeeze()
    rate = torch.clamp(rate.exp(), max=1e5)
    return rate

@config_enumerate(default="sequential")
def pyro_mdl(data: torch.Tensor):
    n_batch = data.shape[0]    
    with pyro.plate("data", n_batch):
        cell_type = pyro.sample("cell_type", dist.Categorical(dataset.probas))
        print(cell_type)
        z = pyro.sample(
            "z",
            dist.MultivariateNormal(
                mus[cell_type], covariance_matrix=sigmas[cell_type]
            ),
        )
        rate = compute_rate(dataset.a_mat, dataset.b, z)
        pyro.sample("x", dist.Poisson(rate=rate).to_event(1), obs=data)

Here is associated trace, for n_batch=10, n_latent=3

 Trace Shapes:        
  Param Sites:        
 Sample Sites:        
     data dist    |   
         value 10 |   
      log_prob    |   
cell_type dist 10 |   
         value 10 |   
      log_prob 10 |   
        z dist 10 |  3
         value 10 |  3
      log_prob 10 |   
        x dist 10 | 50
         value 10 | 50
      log_prob 10 |

Based on that, I perform MCMC sampling using NUTS or HMC, displayed values of cell_type are always the same (after printing)! Isn’t this worrisome? I expect the categorical variable to take all possible values in the MCMC inference right?

Have you an idea what I did wrong?

Can I explicitly tell pyro that it should integrate over these values?

neerajprad · July 11, 2019, 8:23pm

A few points:

You shouldn’t have to annotate your model with @config_enumerate(default="sequential") for MCMC, default sites will be enumerated over in parallel, by default.
That’s the reason why you find cell_type to have the same value because it is basically enumerating over in parallel all possible values that Categorical could take (remember, config_enumerate has no effect) and aggregating over these to compute the potential energy.

You might find the tensor shapes tutorial handy if you would like to get into more details on tensor shapes with parallel enumeration.