Gamma distribution modeling concentration and rate - Predictive error

Hi @yoshy, I believe you can always guard the bad value in a non-execute branch. For example,

import torch

value = torch.tensor(0., requires_grad=True)
safe_value = torch.where(value > 0., value, torch.tensor(1.))
y = torch.where(value > 0., torch.log(safe_value), torch.log1p(value))
y.backward()
value.grad  # y is the same as before, but AD is happy now

See this note from tfp team