Specifying gamma distribution GLM

I have a GLM I’m trying to build. I have two independent variables x_1 and x_2 and I’m trying to model events where the residuals are gamma distributed. I’m computing mu as a simple linear combination of x_1 and x_2 instead of messing with inverses because I want a model that is easily interpretable.

I know the canonical way of specifying a gamma GLM is to compute \mu and then specify the rate using \mu as rate = shape/mu.

However, I found that sampling the rate, and specifying the shape as shape = mu * rate gives me a better ELBO loss and better predictions.

My question is, is this some sort of sin or incorrect specification of a model?

Hi @bshabash, would you be able to provide code (ideally Pyro code) for the two versions of the model you’re considering? That would help me understand your question and help other readers learn something from your trick.

@fritzo

Sure

version 1

def model(X, y=None):


    means = torch.zeros(X.size(1))
    standard_deviations = torch.full((X.size(1),), 10.0)

    betas = pyro.sample("beta", pyro.distributions.Normal(means, standard_deviations).to_event(1)
    mu = X.matmul(betas) 


    # Sample rate, and infer shape based on mu = shape/rate
    # rate must be non-zero
    rate = pyro.sample("rate", dist.HalfCauchy(10.0)).clamp(min=torch.finfo(torch.float32).eps)


    shape = rate * mu
         
    if (y is not None):
       y_obs = y + torch.finfo(torch.float32).eps
    else:
       y_obs = y
    
    with pyro.plate("data", mu.size(0)):
        pyro.sample("obs", dist.Gamma(concentration=shape, rate=rate), obs=y_obs)
    
    return mu

version 2

def model(X, y=None):


    means = torch.zeros(X.size(1))
    standard_deviations = torch.full((X.size(1),), 10.0)

    betas = pyro.sample("beta", pyro.distributions.Normal(means, standard_deviations).to_event(1)
    mu = X.matmul(betas) 


    # Sample shape, and infer rate based on mu = shape/rate
    # rate must be non-zero
    shape = pyro.sample("shape", dist.HalfCauchy(10.0)).clamp(min=torch.finfo(torch.float32).eps)


    rate = shape/mu
    rate = rate.clamp(min=torch.finfo(torch.float32).eps)
         
    if (y is not None):
       y_obs = y + torch.finfo(torch.float32).eps
    else:
       y_obs = y
    
    with pyro.plate("data", mu.size(0)):
        pyro.sample("obs", dist.Gamma(concentration=shape, rate=rate), obs=y_obs)
    
    return mu

I see, thanks for providing code. Yes I think what you’re doing is fine, it amounts to changing from a half-Cauchy prior on rate to an inverse-half-Cauchy prior on rate. You might also try a LogNormal prior, since that would be the same in either version (since log-normal is symmetric in log space). I guess another difference is that you’re making the parameters of the distribution depend on mu. I suppose an even more flexible model could interpolate between your two models:

shape_mu = pyro.sample("shape_mu", dist.LogNormal(0, 1))
power = pyro.sample("power", dist.Uniform(0, 1))
shape = shape_mu * mu ** power
rate = shape / mu