Here’s another twist. I added .01 to all y so that the new minimum is .01. I then rewrote log_prob
to be calculated as
torch.where(value > 0.01, (torch.log1p(-self.theta) + self.concentration * torch.log(self.rate) + (self.concentration - 1) * torch.log(value) - self.rate * value - torch.lgamma(self.concentration)), torch.log(self.theta))
This results in a log probability that is exactly the same as before since I used value > 0.01 as the condition. However, now the model trains successfully and the parameter estimates are close to expected. But why would those changes allow the model to train successfully? Where else are the y observations being used in training?