Need more variation in HMC

Hello,
I’ve been playing around with inference in HMC on a nonlinear model of a biochemical network.

I’ve set up a toy model where I am sampling each parameter with an exponential distribution where the mean is the ground truth parameter. After conditioning on the response variables and running HMC, I get mean estimates that are in right ball park but wrong, and more worryingly I get 0 variance.

How can I coax more more variation out of my MCMC?

More specifically;

  • the paramters of the model are small positive-valued decimals. For example one parameter (of 12 total is 6.351334e-04
  • the parameters are coefficients in a set of nonlinear regression lines representing a physical mechanism. An example is: Y = sample('y', Poisson(p1 * X1 / (p1 * X1 + p2 * X2)) – this reflects the real world physical mechanism between covariates X1 and X2 and response Y.
  • the parameters themselves are sampled from an exponential with a mean equal to their ground truth values, for example: p1 = sample('p1', Exponential(1/ground_truth['p1']))

I treat the variables like X1, X2, and Y as observed in my model and the parameters as latent. I condition on simulated data, and apply HMC. Here is some actual code:

targets = ['egf_sos', 'igf_sos', 'sos_ras', 'ras_pi3k', 'egf_pi3k', 'igf_pi3k']
hmc_kernel = HMC(conditioned_model, step_size=0.0001, num_steps=100)
mcmc_run = MCMC(hmc_kernel, num_samples=500, warmup_steps=100).run()
estimates=OrderedDict()
for target in targets:
    marginal = EmpiricalMarginal(mcmc_run, target)
    estimates[target] = {
        'mean': round(float(marginal.mean), 5),
        'sd': round(sqrt(float(marginal.variance)), 5),
        'true': round(ground_truth[target], 5)
    }
pprint(estimates)

The results:

OrderedDict([('egf_sos', {'mean': 0.00049, 'sd': 0.0, 'true': 0.00064}),
             ('igf_sos', {'mean': 0.03779, 'sd': 0.0, 'true': 0.02782}),
             ('sos_ras', {'mean': 0.27975, 'sd': 0.0, 'true': 0.85372}),
             ('ras_pi3k', {'mean': 6e-05, 'sd': 0.0, 'true': 6e-05}),
             ('egf_pi3k', {'mean': 0.00091, 'sd': 0.0, 'true': 0.01154}),
             ('igf_pi3k', {'mean': 0.00604, 'sd': 0.0, 'true': 0.01154})])

Why am I getting 0 variation and how do I fix it?

Without getting into the model details, I think one plausible reason is that HMC is rejecting most proposals, and you probably have just a single sample that you started out with. HMC spits out acceptance probabilities when running - can you check if the acceptance probability is reasonable so that we can rule this out?

If you find that the acceptance probability is low, I would suggest using adapt_step_size=True (you may need to increase the number of warmup steps as well), so that the algorithm finds a reasonable step size to use during the warmup phase. Alternatively, you could try out NUTS with adapt_step_size=True in which case you wouldn’t have to supply either the step size or the number of steps to be used in the integrator. Let me know if any of this helps. If not, we might have to dig a bit deeper into the model.

Thank you, this was useful. However, I didn’t get any result yet. I’ve let it run for several hours, and I’m not sure if its just stuck. Comparatively, Stan’s HMC takes about 20 minutes with the same model and similar arguments. Is there a way to peak at a log or some kind of progress indicator as the inference is running, so I can see what’s happening under the hood?

Stan’s HMC takes about 20 minutes with the same model and similar arguments.

Our HMC implementation is much slower than Stan currently. A big part of it is the python overhead which we are looking to eliminate with PyTorch JIT (amongst many other improvements), but even then I would expect Stan to be faster on models with small data tensors (where running on the GPU doesn’t confer any benefits). I will be making a change soon so that we can see a progress bar as the sampling progresses. You could also adjust the time period for logging the output here, i.e. increase 20 to a higher number.

If your model isn’t vectorized, that could be one reason for slow run time. So I would suggest removing any loops wherever possible. This should be much simpler in Pyro as all PyTorch distributions are vectorized. You could also try a smaller dataset to see if things are at least working as you might expect.

1 Like