Potential_energy returns nan values

Nicola · March 1, 2024, 9:41pm

Using potential_energy (Inference Utilities — NumPyro 0.0 documentation) on my model (see below) gives some nan values, when passing samples obtained from NUTS. Why could that be the case?

Are the samples from NUTS that I am passing on the constrained or unconstrained space? Perhaps that affects it.

The model (a regularized horseshoe logistic regression):

    def model(*, N, d, scale_icept, scale_global, nu_global, nu_local, slab_scale, slab_df, x=None, y=None):

        # Priors
        beta0_unscaled = numpyro.sample('beta0_unscaled', dist.Normal(0.0, 1.0))
        beta0 = numpyro.deterministic('beta0', beta0_unscaled * scale_icept)

        tau = numpyro.sample('tau', dist.StudentT(nu_global, 0.0, scale_global))

        lambda_ = numpyro.sample('lambda', dist.StudentT(nu_local, 0.0, 1.0).expand([d]))

        c_aux = numpyro.sample('caux', dist.InverseGamma(0.5 * slab_df, 0.5 * slab_df))

        z = numpyro.sample('z', dist.Normal(0.0, 1.0).expand([d]))

        # Data generation process
        c = numpyro.deterministic('c', slab_scale * jnp.sqrt(c_aux))

        lambda_tilde = numpyro.deterministic('lambda_tilde',
                                             jnp.sqrt(c ** 2 * lambda_ ** 2 / (c ** 2 + tau ** 2 * lambda_ ** 2)))

        beta = numpyro.deterministic('beta', z * lambda_tilde * tau)

        f = beta0 + jnp.matmul(x, beta) 

        # f = jnp.reshape(f, (-1, 1))

        with numpyro.plate('data', N):
            # Likelihood
            numpyro.sample('obs', dist.Bernoulli(logits=f), obs=y)

Thanks for the help.

fehiepsi · March 1, 2024, 9:48pm

Hi @Nicola, you can enable this jax_debug_nans flag for debugging. potential_energy accepts the unconstrained samples. You can use unconstrain_fn to convert NUTS samples to unconstrained domain. But I think it’s better to use log_density instead of potential_energy.

digitechroshni · March 2, 2024, 7:20am

Hi, it may be better to use log_density instead of potential_energy for a more accurate evaluation of the log probability density function.

Nicola · March 2, 2024, 4:47pm

Thanks a lot both !!

However in my application, I have to use samples in the unconstrained space.

The reason is that I am going to do importance sampling on this model, and the proposed samples will come from a mixture of Gaussians, so they lie on an unconstrained space.

I have to evaluate these samples with potential_energy and/or log_density to compute IS weights. But so as I understand, I can only use potential_energy because of the unconstrained samples ?

And why is log_density better ?

The reason I was generating MCMC samples with NUTS is for debugging purposes, to check that passing NUTS samples (after use of unconstrain_fn) to potential_energy returns reasonable values… or well, at least not NaNs.

Yet, I keeping get some NaNs, even after using unconstrain_fn before passing the samples, is that weird ?

Could it be because NUTS is not working well on this model ? But for me it’s strange, a log PDF may be very negative but not sure why NaN.

For example, I get bad r_hat’s for some parameters. I am using :
num_chains = 7 # I have 7 cores
num_warmup = 20000
num_samples = 1000
tree_depth = 30
dense_mass = True

Even with these settings, with the model above, AND only considering 3 features (the first 3 dimensions of x), since the actual data dimension is 1536, I get relatively bad MCMC stats.

I have some code to reproduce below. I am not very experienced with NUTS so maybe I am missing something very obvious.

import numpyro
from numpyro.infer import MCMC, NUTS
from jax import random
import jax.numpy as jnp  # Use JAX's numpy
import numpyro.distributions as dist

numpyro.set_host_device_count(7)
numpyro.enable_x64() # Use 64-bit precision

def run_mcmc(model, model_args, model_hyperparameters, key):
    dimension = model_args['d']
    num_chains = 7 
    num_warmup = 20000
    num_samples = 1000
    tree_depth = 30
    dense_mass = True
    find_heuristic_step_size = True

    nuts_kernel = NUTS(model, max_tree_depth=tree_depth, dense_mass=dense_mass, find_heuristic_step_size=find_heuristic_step_size)
    mcmc = MCMC(nuts_kernel, num_warmup=num_warmup, num_samples=num_samples, num_chains=num_chains)
    mcmc.run(key, **model_args)
    mcmc.print_summary()
    
    samples_mcmc = mcmc.get_samples()

    # temp = ('./results/mcmc_samples_numwarmup_{}_numsamples_{}_nchains_{}_treedepth_{}_dimension_{}_densemass_{}_heuristic_{}_modelhypers_scale_icept:{}_scale_global:{}_nu_global:{}_nu_local:{}_slab_scale:{}_slab_df:{}.npz').format(num_warmup, num_samples, num_chains, tree_depth, dimension, dense_mass, find_heuristic_step_size, model_hyperparameters['scale_icept'], model_hyperparameters['scale_global'], model_hyperparameters['nu_global'], model_hyperparameters['nu_local'], model_hyperparameters['slab_scale'], model_hyperparameters['slab_df'])
    # np.savez(temp, **samples_mcmc)

    # return temp

if __name__ == '__main__':
    slab_scale = 1.
    scale_icept = 1.
    nu_global = 0.1
    nu_local = 0.1
    slab_df = 0.1
    scale_global = 0.01

    def model(*, N, d, scale_icept, scale_global, nu_global, nu_local, slab_scale, slab_df, x=None, y=None):
        beta0_unscaled = numpyro.sample('beta0_unscaled', dist.Normal(0.0, 1.0))
        beta0 = beta0_unscaled * scale_icept
        tau = numpyro.sample('tau', dist.StudentT(nu_global, 0.0, scale_global))
        lambda_ = numpyro.sample('lambda', dist.StudentT(nu_local, 0.0, 1.0).expand([d]))
        c_aux = numpyro.sample('caux', dist.InverseGamma(0.5 * slab_df, 0.5 * slab_df))
        z = numpyro.sample('z', dist.Normal(0.0, 1.0).expand([d]))
        c = slab_scale * jnp.sqrt(c_aux)
        lambda_tilde = jnp.sqrt(c ** 2 * lambda_ ** 2 / (c ** 2 + tau ** 2 * lambda_ ** 2))
        beta = z * lambda_tilde * tau
        f = beta0 + jnp.matmul(x, beta)

        with numpyro.plate('data', N):
            numpyro.sample('obs', dist.Bernoulli(logits=f), obs=y)

    # The true data to load, in this test we use random data
    # _, _, y, x = load_data_from_mat('./ovarian.mat')

    key = random.PRNGKey(0)
    key, subkey = random.split(key)

    x, y = random.normal(key, shape=(54, 1536)), random.binomial(subkey, 1, 0.5, shape=(54, ))

    assert scale_global > 0

    model_args = {
        'N': x.shape[0],
        'd': x.shape[1],
        'y': y,
        'x': x,
        'scale_icept': scale_icept,
        'scale_global': scale_global,
        'nu_global': nu_global,
        'nu_local': nu_local,
        'slab_scale': slab_scale,
        'slab_df': slab_df,
    }

    model_hyperparameters = {
        'scale_icept': scale_icept,
        'scale_global': scale_global,
        'nu_global': nu_global,
        'nu_local': nu_local,
        'slab_scale': slab_scale,
        'slab_df': slab_df,
    }

    key, subkey = random.split(key)

    run_mcmc(model, model_args, model_hyperparameters, subkey)

fehiepsi · March 3, 2024, 6:10pm

Could you include the code that uses potential_energy? Your code only shows the mcmc loop. I think you can look at the sample that generates NaN, make sure that the sample looks reasonable after applying unconstrain_fn, and finally use potential_energy with that sample. It is easier to debug if you can post here the model together with the NaN sample. My strategy would be to enable NaN-check flag or go line-by-line through the model to see which variable has NaN log prob.

Nicola · March 5, 2024, 9:48am

Thank you ! Actually I solved that problem, thanks for the advice.

I would have another question.
I have reparameterized my model as below and now NUTS works excellently.
However now I have a lot of variables. I introduced w_tau, v_tau, w_lambda, v_lambda as intermediates to sample tay and lambda_ . Is there a way I can “hide” the intermediate variables somehow without changing the meaning of the model ? Or is it just a fact of reparameterization that it makes the space of the posterior higher dimensional ?

    def model(*, N, d, scale_icept, scale_global, nu_global, nu_local, slab_scale, slab_df, x=None, y=None):
        # Priors
        beta0_unscaled = numpyro.sample('beta0_unscaled', dist.Normal(0.0, 1.0))

        # beta0 = numpyro.deterministic('beta0', beta0_unscaled * scale_icept)
        beta0 = beta0_unscaled * scale_icept

        # tau = numpyro.sample('tau', dist.StudentT(nu_global, 0.0, scale_global))
        w_tau = numpyro.sample('w_tau', dist.InverseGamma(nu_global / 2.0, nu_global / 2.0)) #### These are actually cauchys
        v_tau = numpyro.sample('v_tau', dist.Normal(0.0, 1.0))
        tau = numpyro.deterministic('tau', scale_global * v_tau * w_tau**(-0.5))

        # lambda_ = numpyro.sample('lambda', dist.StudentT(nu_local, 0.0, 1.0).expand([d]))
        w_lambda = numpyro.sample('w_lambda', dist.InverseGamma(nu_local / 2.0, nu_local / 2.0).expand([d])) #### These are actually cauchys
        v_lambda = numpyro.sample('v_lambda', dist.Normal(0.0, 1.0).expand([d]))
        lambda_ = numpyro.deterministic('lambda',v_lambda * w_lambda**(-0.5))

        c_aux = numpyro.sample('caux', dist.InverseGamma(0.5 * slab_df, 0.5 * slab_df))

        z = numpyro.sample('z', dist.Normal(0.0, 1.0).expand([d]))

        # Data generation process
        # c = numpyro.deterministic('c', slab_scale * jnp.sqrt(c_aux))
        c = slab_scale * jnp.sqrt(c_aux)

        # lambda_tilde = numpyro.deterministic('lambda_tilde',
        #                                      jnp.sqrt(c ** 2 * lambda_ ** 2 / (c ** 2 + tau ** 2 * lambda_ ** 2)))
        lambda_tilde = jnp.sqrt(c ** 2 * lambda_ ** 2 / (c ** 2 + tau ** 2 * lambda_ ** 2))

        # beta = numpyro.deterministic('beta', z * lambda_tilde * tau)
        beta = z * lambda_tilde * tau

        f = beta0 + jnp.matmul(x, beta)

        # f = jnp.reshape(f, (-1, 1))

        with numpyro.plate('data', N):
            # Likelihood
            numpyro.sample('obs', dist.Bernoulli(logits=f), obs=y)

digitechroshni · March 12, 2024, 11:03am

okay great!