Doing importance sampling for 8schools, log_density vs potential energy

Nicola · May 23, 2024, 1:34pm

Hi,

I read a few explanations in this forum about log_density vs potential energy but I am still confused. It would be helpful to understand this post if you have some familiarity with importance sampling for estimating integrals (in particular the marginal likelihood/evidence, in this case).

I am using the 8 schools model from the tutorial (Getting Started with NumPyro — NumPyro documentation)

        def model(J, sigma_obs=None, y=None):
            mu = numpyro.sample('mu', dist.Normal(0, 5))
            tau = numpyro.sample('tau', dist.HalfCauchy(5))

            with numpyro.plate('J', J):

                with numpyro.handlers.reparam(config={'theta': TransformReparam()}):
                    theta = numpyro.sample(
                        'theta',
                        dist.TransformedDistribution(dist.Normal(0., 1.),
                                                     dist.transforms.AffineTransform(mu, tau))
                    )
                sigma = numpyro.sample('sigma_obs', dist.HalfCauchy(5), obs=sigma_obs) # sigma obs is a vector of length J
                numpyro.sample('y_obs', dist.Normal(theta, sigma), obs=y)

My application is I want to do importance sampling, i.e., estimate the normalizing constant of this posterior. Clearly in this model for example we have 1 constrained parameter, tau.

To do IS, all we need is the unnormalized log_pdf of the posterior. In NumPyro, this is log_density from infer.util or potential_energy.

So, I understand that potential_energy takes unconstrained samples and log_density takes samples that live in the original space.

Still, I don’t understand what I should use with IS, note that I use a Student-t proposal.

To do IS, what we do is we sample from an arbitrary distribution q (these are almost always Student-t distributions or Gaussians, so always unconstrained samples), we weight the samples as
log_pdf_target(sample) - log_pdf_proposal(sample)

and the average of the weights is the estimate of the normalizing constant. For the special case where q is the prior of the model, the weights are just likelihoods and this doesn’t work well usually.

My problem is, I am not sure if I should use (-1)*potential_energy (because my samples are unconstrained), or if I should use log_density to do this, and include (how?) some Jacobian determinant in the computation of the IS weights. Or is it correct to just use constrain_fn(samples) and pass them to log_density ? Too many options, I am quite confused.

Aside: it is not clear to me also if the MCMC samples returned from NUTS are constrained or unconstrained.

Thanks