I am trying to estimate the distribution of the slope and intercept parameters in a simple linear regression model. The problem is that all chains seems to explore a very small fraction of the parameter space (perhaps they are stuck)! I set the “number of steps” and “step size” parameters to different values but that did not resolve the issue. Increasing the “number of samples” also does not make the model to converge to the correct value! Beta_0 should have an approximate Gaussian distribution with its mode around -2315 and Beta_1 should have its mode close to 32 but I get multimodal distributions! Finally, the “number of effective sample” diagnostic is around 10 which is quite small. Since this is a very simple model I am wondering what it is that I am doing wrong! I would be thankful if you could help me with this issue.
The initial dimensions are gram and millimetre for mass and length. I divided both by 1000 to convert them to kg and meter. However, that still does not solve the issue! I could try standardising and normalising my data sets (as this approach worked before for another problem but the same issue) but the point is that the model I have here is very simple and HMC should converge just fine without standardisation and/or normalisation. (Scaling from g to kg and mm to m should be more than enough.) I should also add that I am trying to reproduce the result of the code block 3.13 from the book Bayesian Modelling and in that example the authors use TensorFlow Probability package and they use a “HalfStudentT(100, 2000)” distribution for their variance while I am using HalfNormal as Pyro does not support this distribution but given the simplicity of the model and similarity in distribution for HalfStudentT(100, 2000) and HalfNormal(2000) I doubt this is causing the issue (but I may be wrong).
I am aware that the 2nd argument in the Normal distribution is standard deviation but I admit that I should have been more clear about this. Earlier, when I said “variance”, I meant standard deviation (std). So when I write:
I mean a normal distribution with mean zero and std 4000. The reasoning behind the choice of such large std, according to the book, is that we do not have any prior knowledge about the possible range of the parameter of interest so we go with a large std/variance. (Note that, in the book, they use gram and millimeter.) As for the second point, changing to 64 bit did not resolve the issue! And regarding your last point, I’d have to say no! I have not tried NUTS. I will try it and will share the result with you.
I ran NUTS and it found the correct distribution! First, I used the original scale of the mean and std for the Normal and HalfNormal distributions to see if NUTS can capture the correct distribution. And it did! The only difficulty is that it took almost 80 minutes to run!
I noticed that it took NUTS about 90 mins to capture the correct distributions. (I did expect it to converge faster given the distributions now cover a much smaller range.) I can imagine the performance can be improved with some parameter tuning. @martinjankowiak I have one last question regarding your remark on my step size. Given the adaptive step size method is in action, does it matter what step size I give the model? The model converges to the optimal step size anyway (in this case 2.e-2).