Understand Importance sampling for marginal computation

I’m new to probabilistic programming. This question pertains to the “Inference in Pyro: From Stochastic Functions to Marginal Distributions” tutorial file.

I’m using pyro version: 0.1.2

In the tutorial

When called with an input guess, marginal first uses posterior to generate a sequence of weighted execution traces given guess, then builds a histogram over return values from the traces, and finally returns a sample drawn from the histogram. Calling marginal with the same arguments more than once will sample from the same histogram.

Just trying to understand what this means mathematically.

My thinking is along the following lines:

Equation (1) represents the marginal we’re trying to compute.
Equation (3) sort of represents what I think is going on.


  1. q(w) is proposal distribution (Q1. Where is q(w) specified in tutorial?)
  2. Sample num_sample times to get num_sample samples of W (importance weight)
  3. Take the max occuring sample (from histogram)
  4. Sample from f(meas | w) where w is taken from histogram and output as sample from marginal

Am I thinking correctly here?

The prior (f(w | guess) in this case?) is used as the proposal distribution. It looks like the Importance class takes in a guide and when None, the code uses the input model with all the observed values blocked (using poutine.block).

If I understand correctly, then we build an empirical distribution f(meas | guess) by building a histogram of 100 values. Since we are using the prior as the guide, in this case the weights will always be 1 (or log p always 0).

During sampling (the call to marginal()), it just creates a Categorial distribution over all the samples. In this case the categorical distribution will be uniform because all weights are same 1.0.