I was reading some Pyro code samples from a set of blog posts that convert the R code from MacElreath’s Statistical Rethinking book into Pyro code. One of the things that was a bit confusing was the use of the log probability when drawing samples from a distribution. I was trying to understand why they are using the log here, as I was not sure if it was an artifact of using Pyro, or if there was some technical reason–I think it is the former.
Here is the original R code and then the corresponding Pyro code follows.
p_grid <- seq( from=0 , to=1 , length.out=1000 )
prob_p <- rep( 1 , 1000 )
prob_data <- dbinom( 6 , size=9 , prob=p_grid )
posterior <- prob_data * prob_p
posterior <- posterior / sum(posterior)
# NOTE THAT THE POSTERIOR IS USED UNLOGGED
samples <- sample( p_grid , prob=posterior , size=1e4 , replace=TRUE )
Note that the posterior
variable will give probabilities for each value under the posterior distribution.
Here is the corresponding Pyro code.
p_grid = torch.linspace(start=0, end=1, steps=1000)
prior = torch.tensor(1.).repeat(1000)
likelihood = dist.Binomial(total_count=9,
probs=p_grid).log_prob(torch.tensor(6.)).exp()
posterior = likelihood * prior
posterior = posterior / sum(posterior)
# NOTE THE `POSTERIOR` VARIABLE IS NOW LOGGED????
samples = pyro.distributions.Empirical(p_grid, posterior.log()).sample(torch.Size([int(1e4)]))
Again, I was just trying to understand why to use posterior.log()
in that last line of pyro code? Is that an implementation detail from using Pyro, since the R code does not do that.
Note that I understand the pyro.distributions.Empirical()
function takes either weights
or log_weights
, so that makes sense. BUT, is there a computational advantage to providing the logged weights versus the unlogged weights?