Hi @yoshy, I believe you can always guard the bad value in a non-execute branch. For example,
import torch
value = torch.tensor(0., requires_grad=True)
safe_value = torch.where(value > 0., value, torch.tensor(1.))
y = torch.where(value > 0., torch.log(safe_value), torch.log1p(value))
y.backward()
value.grad # y is the same as before, but AD is happy now