I’m finding that setting initial parameter values through NUTS or through MCMC gives different results, even though they should be the same.
I’ve checked that the parameter values are initialized to be the same from the first sample taken in a run either using
nuts_kernel = NUTS(model,
dense_mass=dense_mass,
max_tree_depth=6,
init_strategy=init_to_value...
sampler = MCMC(nuts_kernel,
num_warmup=10,
num_samples=10,
jit_model_args=True)
key, subkey = random.split(key)
sampler.warmup(subkey, X, collect_warmup=True)
or with
nuts_kernel = NUTS(model,
dense_mass=dense_mass,
max_tree_depth=6)
sampler = MCMC(nuts_kernel,
num_warmup=10,
num_samples=10,
jit_model_args=True)
key, subkey = random.split(key)
sampler.warmup(subkey, X, collect_warmup=True, init_params=...)
In the first case, I’m using the init_to_value approach as provided for in the NUTS interface. In the latter, I’m giving the initial parameter values directly to the MCMC class instance.
The chains are exactly the same for the first few steps but then they diverge.
I’ve set the random keys to be the same, but I suppose this is not guaranteed to make them the same without looking carefully at the source. However, I see that the statistics of the samples returned from the two different initialization strategies is completely different (Rhat values are different by an order of magnitude).
I did look quickly at the source, and there no obvious reason from the initialization steps that these should lead to different results.
Would appreciate insights. Thanks.