 # Modeling a Gamma distributed censored event

I have some variable `X ~ Gamma(shape, rate)`

but the data I have, has it censored, so there is some value of Y (say 1.0) where if its true value is higher, the recorded observation is 1.0. I want to model this variable as a function based on a linear combination of some feature, X

The model I try to employ looks like this:

``````from scipy.stats import gamma as scipygamma

def model_gamma_cen(X, y, censored_label):

min_value = torch.finfo(X.dtype).eps
max_p_value = 1.0 - torch.finfo(X.dtype).eps
max_value = torch.finfo(X.dtype).max/100.0

# Prior on the intercept

intercept_prior = dist.Normal(0.0, 1.0)
linear_combination = pyro.sample(f"beta_intercept", intercept_prior)

# Also define coefficient priors
coefficients_prior = dist.Normal(0.0, 1.0).expand([X.shape])
betas = pyro.sample(f"beta_coefficients", coefficients_prior)

# Finally, calculate the linear combination of parameters and X values
linear_combination = linear_combination + torch.matmul(X, betas)

# But now our mean will be e^{linear combination}, I'm using a link
# function log(mu)  = linear combo of X
mean = torch.exp(linear_combination).clamp(min=min_value, max=max_value)

# We will also define a rate parameter
rate = pyro.sample("rate", dist.HalfCauchy(scale=10.0).clamp(min=min_value)

# Since mean = shape/rate, then the shape = mean * rate
shape = (mean * rate)

# The data is now divided into two outcomes: censored, and non-censored
rate_np = rate.detach().item()

with pyro.plate("data", y.shape):
# If the data is not censored, we just observe the y value and it's
# dictated by the Gamma distribution
observation = pyro.sample("obs", dist.Gamma(shape, rate), obs=y)

# If the data is censored, we need to calculate the 1-CDF of the Gamma function
# we will use the CDF implementation in scipy.stats.gamma

scipy_gamma_dist = scipygamma(a=shape.detach().numpy(), scale=1/rate.detach().numpy())

# find truncation probability
truncation_probability_np = 1.0 - scipy_gamma_dist.cdf(y.detach().numpy())
truncation_prob = torch.tensor(truncation_probability_np).clamp(min=min_value, max=max_p_value)

censored_observation = pyro.sample(f"censorship", dist.Bernoulli(truncation_prob), obs=torch.tensor(1.0)
``````

Is there anything wrong with my censoring logic, or in my attempt to combine `scipy.stats.gamma` with Pyro?

i can’t follow what you’re doing without more details but in general using non-pyro distributions in a model/guide won’t work because (for example) gradients can’t flow to the parameters of the distribution (e.g. if you’re doing variational inference)

I’m using MCMC and NUTS, so I don’t think those depend on the gradients, do they?

EDIT: What more details should I provide?

HMC/NUTS uses gradients of the log density of the model to do inference so the specification of the model needs to be entirely in PyTorch

EDIT: What more details should I provide?

logic/overview of your model and what you’re trying to accomplish (more context than code)

Oh I see

What I’m trying to do is build a Generalized Linear Model on a Gamma distribution, which is easy enough. However, as I mentioned, some of my observations are censored. The data is the result of a survey on price so some prices were just listed as “\$1,000+”, which means I don’t know if they were \$1,000, or \$2,000 or \$5,000.

I’d like to make sure I’m getting proper coefficients for my linear model by exploring the possibility of censoring, so I’m trying to calculate the CDF for my Gamma distributed variable, given my observations. I took this approach from the old Pyro Tutorial about modelling censored events

as far as i can tell the general approach seems reasonable. although (i repeat for clarity) the `detach` statements will cause problems with HMC. the `clamp` statements might as well. also it might make sense to split your observations explicitly into censored/uncensored so that you don’t need the `mask`s.