Using Log probability when sampling from a distribution

I was reading some Pyro code samples from a set of blog posts that convert the R code from MacElreath’s Statistical Rethinking book into Pyro code. One of the things that was a bit confusing was the use of the log probability when drawing samples from a distribution. I was trying to understand why they are using the log here, as I was not sure if it was an artifact of using Pyro, or if there was some technical reason–I think it is the former.

Here is the original R code and then the corresponding Pyro code follows.

p_grid <- seq( from=0 , to=1 , length.out=1000 )
prob_p <- rep( 1 , 1000 )
prob_data <- dbinom( 6 , size=9 , prob=p_grid )
posterior <- prob_data * prob_p
posterior <- posterior / sum(posterior)

samples <- sample( p_grid , prob=posterior , size=1e4 , replace=TRUE )

Note that the posterior variable will give probabilities for each value under the posterior distribution.

Here is the corresponding Pyro code.

p_grid = torch.linspace(start=0, end=1, steps=1000)
prior = torch.tensor(1.).repeat(1000)
likelihood = dist.Binomial(total_count=9,
posterior = likelihood * prior
posterior = posterior / sum(posterior)

samples = pyro.distributions.Empirical(p_grid, posterior.log()).sample(torch.Size([int(1e4)]))

Again, I was just trying to understand why to use posterior.log() in that last line of pyro code? Is that an implementation detail from using Pyro, since the R code does not do that.

Note that I understand the pyro.distributions.Empirical() function takes either weights or log_weights, so that makes sense. BUT, is there a computational advantage to providing the logged weights versus the unlogged weights?

I guess no. Under the hood, log_weights is used to create a Categorical distribution, which can accept either probs (weights) or logits (log_weights).

1 Like

@fehiepsi Thanks so much. Yeah, that makes sense. In the OP it was just interesting that the pytorch code suddenly included logged weights versus unlogged weights, so was not sure if there was some numerical reason to do that–faster computation or such. There are some cases where logs are used as probabilities, but I could not remember the context. But seems like none of that matters here. Thanks again.