Hi,
I read a few explanations in this forum about log_density vs potential energy but I am still confused. It would be helpful to understand this post if you have some familiarity with importance sampling for estimating integrals (in particular the marginal likelihood/evidence, in this case).
I am using the 8 schools model from the tutorial (Getting Started with NumPyro — NumPyro documentation)
def model(J, sigma_obs=None, y=None):
mu = numpyro.sample('mu', dist.Normal(0, 5))
tau = numpyro.sample('tau', dist.HalfCauchy(5))
with numpyro.plate('J', J):
with numpyro.handlers.reparam(config={'theta': TransformReparam()}):
theta = numpyro.sample(
'theta',
dist.TransformedDistribution(dist.Normal(0., 1.),
dist.transforms.AffineTransform(mu, tau))
)
sigma = numpyro.sample('sigma_obs', dist.HalfCauchy(5), obs=sigma_obs) # sigma obs is a vector of length J
numpyro.sample('y_obs', dist.Normal(theta, sigma), obs=y)
My application is I want to do importance sampling, i.e., estimate the normalizing constant of this posterior. Clearly in this model for example we have 1 constrained parameter, tau.
To do IS, all we need is the unnormalized log_pdf of the posterior. In NumPyro, this is log_density from infer.util or potential_energy.
So, I understand that potential_energy takes unconstrained samples and log_density takes samples that live in the original space.
Still, I don’t understand what I should use with IS, note that I use a Student-t proposal.
To do IS, what we do is we sample from an arbitrary distribution q (these are almost always Student-t distributions or Gaussians, so always unconstrained samples), we weight the samples as
log_pdf_target(sample) - log_pdf_proposal(sample)
and the average of the weights is the estimate of the normalizing constant. For the special case where q is the prior of the model, the weights are just likelihoods and this doesn’t work well usually.
My problem is, I am not sure if I should use (-1)*potential_energy (because my samples are unconstrained), or if I should use log_density to do this, and include (how?) some Jacobian determinant in the computation of the IS weights. Or is it correct to just use constrain_fn(samples) and pass them to log_density ? Too many options, I am quite confused.
Aside: it is not clear to me also if the MCMC samples returned from NUTS are constrained or unconstrained.
Thanks