Hello,
I am curious to see if the training time for my model is what would be expected for this kind of model.
The model is a fairly simple linear mixed effect model with parameters (and shapes) a (118), c (83 x 118), B (6 x 118), d (809 x 118), E(83 x 118), f(118).
So the total number of parameters is around 116,000.
The model has 2 levels
y_zyx ~ N(a_x + c_zx + X^T*B_x + d_yx, E_zx)
E_zx ~ IG(2, f_x)
Then there are defined priors for a,c,B,d,f
with 3309 observations (3309 observations x 118 features)
The training time is approximately 14 hours using 4 GPU in parallel (4 chains, 10,000 samples). I am also using plate
.
A similar model (approximately same number of levels but single-level) took around 8 hours under the same settings.
Also:
While some of the parameters have fairly stable chain behavior:
Others look more unstable:
Is this expected from NUTS, or is it an indicator that I need to take more samples?
In summary, I am wondering a) if a 14 hour training time (4 gpu, parallel) for a 116,000 parameter model (4 chains, 10,000 samples, 1,000 warmup) and b) if the sampling behavior is familiar
Thanks!