Adding + 1 to Distribution Output in pyro.sample()

student_12 · October 7, 2022, 5:08am

Hi all,

I’m trying to choose a prior distribution for a continuous latent variable that needs to be greater than 1. Since there don’t seem to be many good distributions with this characteristic, I was wondering how incorrect it would be if I just picked a distribution whose support starts at 0 (e.g., Exponential, Gamma, HalfNormal) and then just add +1 after the distribution within the pyro.sample statement? I’m afraid this might mess up the log_prob calculations when doing optimization though.

Is it valid to do this, or do I need to figure out another method (maybe a transformed distribution)? I know that adding a constant value doesn’t affect derivatives/gradients but I’m afraid of a log_prob getting messed up somehow in the Pyro internals when it’s doing traces/putting together the ELBO/etc.

Btw, for context, I’m trying to put a prior on the alpha in the Beta(1, alpha) of a Dirichlet process/GEM. But if I sample a very small alpha value (e.g., ~0.10) from its prior (e.g., Exponential(1)), then the Beta will often output a value of 1.0… so in the stick-breaking process, it eats up all my stick and breaks the algorithm.

chvandorp · October 7, 2022, 1:52pm

Hi @student_12,

I think it’s perfectly fine to add constants to parameters. So what you would do is

import pyro.distributions as dist
import torch
import pyro

alpha_raw = pyro.sample("alpha_raw", dist.Exponential(torch.tensor(1.0)))
# now 'transform' alpha_raw, such that the support is (1, inf)
alpha = alpha_raw + 1
# or if you want to record alpha
alpha = pyro.deterministic("alpha", alpha_raw +  1)
# now use alpha in other sampling statements
p = pyro.sample("p", dist.Beta(torch.tensor(1.0), alpha)

If you’re familiar with Stan, you might be thinking of situations like

parameters {
    real<lower=0> alpha_raw;
}
model {
    real alpha = log(alpha_raw); // non-linear transformation
    alpha ~ normal(0,1);
    // WARNING!!! defined prior for transformed  parameter! add log-jacobian to target
    target += -log(alpha_raw);
    // do something with alpha
}

However, for linear transformations (such as your case), this correction is not required.
I addition, you are defining a prior directly on the non-transformed parameter (alpha_raw), and so this Stan example would not even apply. I don’t think if it is even possible to define priors on transformed parameters in Pyro (please correct me if I’m wrong). How this helps!

chvandorp · October 7, 2022, 2:02pm

Correcting myself a bit: you can sample parameters from transformed distributions. Have a look at TransformedDistribution in the torch documentation.
This computes the log determinant of the Jacobian automatically. In your case I guess you could use an AffineTransform with loc=1 and scale=1.

martinjankowiak · October 7, 2022, 4:47pm

yes @student_12 that is fine. you can freely mix deterministic pytorch operations and random variables coming out of sample statements. what you’re doing is not different in principle from taking a vector of beta coefficients governed by some multivariate normal distribution and multiplying that by covariates (X @ beta) to feed into some downstream likelihood

student_12 · October 7, 2022, 4:55pm

Great, thanks everyone!

I think I’ll go ahead and do the simple +1 addition for this situation. And then if I need to do a more complicated transformation in the future, I’ll use TransformedDistribution so I don’t have to worry about the log_det_jacobian piece.