Convergence issue

Galex · April 19, 2023, 9:06am

Hello,
I am trying to estimate the distribution of the slope and intercept parameters in a simple linear regression model. The problem is that all chains seems to explore a very small fraction of the parameter space (perhaps they are stuck)! I set the “number of steps” and “step size” parameters to different values but that did not resolve the issue. Increasing the “number of samples” also does not make the model to converge to the correct value! Beta_0 should have an approximate Gaussian distribution with its mode around -2315 and Beta_1 should have its mode close to 32 but I get multimodal distributions! Finally, the “number of effective sample” diagnostic is around 10 which is quite small. Since this is a very simple model I am wondering what it is that I am doing wrong! I would be thankful if you could help me with this issue.

def flipper_mass_linear_model(observed_mass, observed_flipper_length):
    
    sigma  = pyro.sample("sigma", dist.HalfNormal(2000.))
    beta_0 = pyro.sample("beta_0", dist.Normal(0, 4000.))
    beta_1 = pyro.sample("beta_1", dist.Normal(0, 4000.))
    mu     = pyro.deterministic("mu", beta_0 + beta_1 * observed_flipper_length)

    mass   = pyro.sample("mass", dist.Normal(mu, sigma), obs = observed_mass)

    return mass

flipper_length_Adelie = torch.Tensor(penguins["flipper_length_mm"][penguins["species"] == "Adelie"].values)

mass_Adelie           = torch.Tensor(penguins["body_mass_g"][penguins["species"] == "Adelie"].values)

hmc_kernel_linear     = HMC(flipper_mass_linear_model, step_size= 1.87e-05, num_steps= 10, adapt_step_size = True)

mcmc_linear           = MCMC(hmc_kernel_linear, num_samples=15000, warmup_steps=5000, num_chains = 4)

mcmc_linear.run(mass_Adelie, flipper_length_Adelie)

martinjankowiak · April 19, 2023, 11:58am

Try units in which relevant quantities are O(1) not O(1000)

Galex · April 19, 2023, 1:16pm

I appreciate the reply but I’m not sure if understand. Are you referring to the range of values covered by the distributions of the parameters?

martinjankowiak · April 19, 2023, 3:54pm

yes for example express mass in terms of metric tons instead of kilograms (or what have you). very big or small numbers in your prior definitions are likely to make default MCMC settings work poorly

Galex · April 19, 2023, 5:36pm

The initial dimensions are gram and millimetre for mass and length. I divided both by 1000 to convert them to kg and meter. However, that still does not solve the issue! I could try standardising and normalising my data sets (as this approach worked before for another problem but the same issue) but the point is that the model I have here is very simple and HMC should converge just fine without standardisation and/or normalisation. (Scaling from g to kg and mm to m should be more than enough.) I should also add that I am trying to reproduce the result of the code block 3.13 from the book Bayesian Modelling and in that example the authors use TensorFlow Probability package and they use a “HalfStudentT(100, 2000)” distribution for their variance while I am using HalfNormal as Pyro does not support this distribution but given the simplicity of the model and similarity in distribution for HalfStudentT(100, 2000) and HalfNormal(2000) I doubt this is causing the issue (but I may be wrong).

martinjankowiak · April 19, 2023, 5:56pm

Are you aware that the 2nd argument to Normal is a square root variance and not a variance?

Have you tried 64 bit precision?

Have you tried NUTS? Your step size could be no good

Galex · April 20, 2023, 8:19am

I am aware that the 2nd argument in the Normal distribution is standard deviation but I admit that I should have been more clear about this. Earlier, when I said “variance”, I meant standard deviation (std). So when I write:

beta_0 = pyro.sample("beta_0", dist.Normal(0, 4000.))

I mean a normal distribution with mean zero and std 4000. The reasoning behind the choice of such large std, according to the book, is that we do not have any prior knowledge about the possible range of the parameter of interest so we go with a large std/variance. (Note that, in the book, they use gram and millimeter.) As for the second point, changing to 64 bit did not resolve the issue! And regarding your last point, I’d have to say no! I have not tried NUTS. I will try it and will share the result with you.

Galex · April 20, 2023, 7:55pm

I ran NUTS and it found the correct distribution! First, I used the original scale of the mean and std for the Normal and HalfNormal distributions to see if NUTS can capture the correct distribution. And it did! The only difficulty is that it took almost 80 minutes to run!

nuts_kernel_linear    = NUTS(flipper_mass_linear_model, step_size= 1.87e-05, adapt_step_size = True)

nuts_linear           = MCMC(nuts_kernel_linear, num_samples=15000, warmup_steps=5000, num_chains = 4)

I rescaled the data to kg and m units (i.e. division by 1000) and used the following distributions:

sigma  = pyro.sample("sigma", dist.HalfNormal(200.))
beta_0 = pyro.sample("beta_0", dist.Normal(0, 20.))
beta_1 = pyro.sample("beta_1", dist.Normal(0, 20.))

I noticed that it took NUTS about 90 mins to capture the correct distributions. (I did expect it to converge faster given the distributions now cover a much smaller range.) I can imagine the performance can be improved with some parameter tuning. @martinjankowiak I have one last question regarding your remark on my step size. Given the adaptive step size method is in action, does it matter what step size I give the model? The model converges to the optimal step size anyway (in this case 2.e-2).

martinjankowiak · April 21, 2023, 7:01am

there is usually no need to provide a step size. just use nuts with default settings.

Galex · April 21, 2023, 10:46am

Thank you.