Hi all,
I’m trying to choose a prior distribution for a continuous latent variable that needs to be greater than 1. Since there don’t seem to be many good distributions with this characteristic, I was wondering how incorrect it would be if I just picked a distribution whose support starts at 0 (e.g., Exponential, Gamma, HalfNormal) and then just add +1 after the distribution within the pyro.sample statement? I’m afraid this might mess up the log_prob calculations when doing optimization though.
Is it valid to do this, or do I need to figure out another method (maybe a transformed distribution)? I know that adding a constant value doesn’t affect derivatives/gradients but I’m afraid of a log_prob getting messed up somehow in the Pyro internals when it’s doing traces/putting together the ELBO/etc.
Btw, for context, I’m trying to put a prior on the alpha in the Beta(1, alpha) of a Dirichlet process/GEM. But if I sample a very small alpha value (e.g., ~0.10) from its prior (e.g., Exponential(1)), then the Beta will often output a value of 1.0… so in the stick-breaking process, it eats up all my stick and breaks the algorithm.
1 Like
Hi @student_12,
I think it’s perfectly fine to add constants to parameters. So what you would do is
import pyro.distributions as dist
import torch
import pyro
alpha_raw = pyro.sample("alpha_raw", dist.Exponential(torch.tensor(1.0)))
# now 'transform' alpha_raw, such that the support is (1, inf)
alpha = alpha_raw + 1
# or if you want to record alpha
alpha = pyro.deterministic("alpha", alpha_raw + 1)
# now use alpha in other sampling statements
p = pyro.sample("p", dist.Beta(torch.tensor(1.0), alpha)
If you’re familiar with Stan, you might be thinking of situations like
parameters {
real<lower=0> alpha_raw;
}
model {
real alpha = log(alpha_raw); // non-linear transformation
alpha ~ normal(0,1);
// WARNING!!! defined prior for transformed parameter! add log-jacobian to target
target += -log(alpha_raw);
// do something with alpha
}
However, for linear transformations (such as your case), this correction is not required.
I addition, you are defining a prior directly on the non-transformed parameter (alpha_raw
), and so this Stan example would not even apply. I don’t think if it is even possible to define priors on transformed parameters in Pyro (please correct me if I’m wrong). How this helps!
Correcting myself a bit: you can sample parameters from transformed distributions. Have a look at TransformedDistribution
in the torch documentation.
This computes the log determinant of the Jacobian automatically. In your case I guess you could use an AffineTransform
with loc=1
and scale=1
.
1 Like
yes @student_12 that is fine. you can freely mix deterministic pytorch operations and random variables coming out of sample
statements. what you’re doing is not different in principle from taking a vector of beta
coefficients governed by some multivariate normal distribution and multiplying that by covariates (X @ beta
) to feed into some downstream likelihood
1 Like
Great, thanks everyone!
I think I’ll go ahead and do the simple +1 addition for this situation. And then if I need to do a more complicated transformation in the future, I’ll use TransformedDistribution so I don’t have to worry about the log_det_jacobian piece.