How is the inverse mass matrix computed in HMC?

Alexander · June 12, 2023, 11:55am

Is the inverse mass matrix in HMC same as the covariance matrix of the corresponding samples obtained from it? If not, how is inverse mass matrix computed?

martinjankowiak · June 12, 2023, 3:40pm

no. it’s something that’s dynamically adjusted during warm-up and will not equal the covariance of the samples. see e.g. the relevant stan documentation or any discussion of nuts/hmc in the literature

Alexander · June 12, 2023, 7:28pm

Thank you for the quick response @martinjankowiak. But, the documentation that you have linked still does not mention how exactly is the inverse mass matrix computed. Can you please provide the appropriate document for reference.

Alexander · June 12, 2023, 7:32pm

I had raised this question based on Michael Betancourt 's article which mentioned this:

It will be really great if you could please help me understand the mistake that I am making. Thanks a lot!

martinjankowiak · June 12, 2023, 7:34pm

the mass matrix only uses warm-up samples: not all samples. also it’s usually regularized somehow as opposed to a raw covariance

see e.g. here or just generally google for strings like “mass matrix adaptation hmc”

Alexander · June 12, 2023, 9:08pm

So, based on the above conversation, I tried out the following code:

J = 8
y = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0])
sigma = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0])
rng_key = random.PRNGKey(0)

def eight_schools(J, sigma, y=None):
    mu = numpyro.sample('mu', dist.Normal(2, 5))
    tau = numpyro.sample('tau', dist.HalfCauchy(5))
    with numpyro.plate('J', J):
        theta = numpyro.sample('theta', dist.Normal(mu, tau))
        numpyro.sample('obs', dist.Normal(theta, sigma), obs=y)

mcmc = MCMC(NUTS(eight_schools, step_size = step, inverse_mass_matrix = matrix, regularize_mass_matrix = False), num_warmup = 50, num_samples = 0)
mcmc.warmup(rng_key,J,sigma,y=y, collect_warmup=True)
warmup_samples = mcmc.get_samples()['theta']

mu_std = jnp.std(mcmc.get_samples()['mu'], axis=0)
tau_std = jnp.std(mcmc.get_samples()['tau'], axis=0)
theta_std = jnp.std(warmup_samples, axis=0)

I compared the square of the above obtained values(mu_std, tau_std, theta_std) from the inverse_square_matrix obtained using the mcmc.last_state. Despite keeping regularize_mass_matrix = False, I am still getting quite varied values. Can you please help me figure out what else am I missing out? Thanks a lot!

martinjankowiak · June 12, 2023, 9:23pm

if you want to follow the adaptation routine in detail i suggest taking a look at the relevant code