Need some advice on modeling a long tail event duration


Glad to join this community as a beginning practitioner of Bayesian modeling. Recently I am trying to build a GLM to model the duration of a long tailed event using numpyro and jax, but in the process I am not quite sure that my choice is sound and would like to get some advices / suggestions here, as shown below:

(1) I chose to use Gamma distribution to model the outcome (event duration) distribution, because from my understanding of the data generation process it is reasonable to believe that there could be a peak at zero like Exponential distribution as well as above zero. Is my choice making sense? Any other distributions would you use to describe it?

(2) For Gamma likelihood I need to define the priors of parameter α and β. In the scenario of building a GLM, would you define the linear model for α or for β (or both)?

Any advice is much appreciated. Thanks~

regarding (2) a simple place to start might be to model the mean as
mean = exp(linear function) and model the variance as a scalar latent variable or as variance = exp(other linear function). you can then solve the equations alpha / beta = mean and alpha / beta^2 = variance to get the parameterization in terms of alpha and beta where alpha=concentration and beta=rate

1 Like

Thank you very much for the great advice. I guess this way I will also have an easier time to make interpretations as the log link is directly applied to mean and variance of the Gamma distribution. I will give it a try.