Example using Categorical distribution and TraceEnum_ELBO?

kkyang · July 10, 2018, 4:22pm

Is there an example out there of how to use TraceEnum_ELBO to learn a categorical latent site?

Specifically: what’s the difference between parallel and sequential enumeration? Do I need to specify this in both the model and the guide, or just one or the other? Do I need to use iarange or otherwise specify independence? What is max_iarange_nesting? Is num_particles a batchsize?

neerajprad · July 10, 2018, 11:55pm

You could take a look at the gaussian mixture model example, which might help clear some of the questions you have. Another useful tutorial would on tensor shapes that goes into some of the details of iaranges.

What’s the difference between parallel and sequential enumeration? Do I need to specify this in both the model and the guide, or just one or the other?

Sequential enumeration runs the model/guide multiple times to generate execution traces sequentially, whereas parallel enumeration uses vectorization to build a parallel trace where discrete sites are enumerated. You can specify this on a per sample site basis with the keyword arg infer={"enumerate": "parallel" (or "sequential")} or you can annotate your guide/model with @config_enumerate(default="parallel") as in the tutorial, which will do the same behind the scenes for all the discrete sample sites in your model and guide.

Do I need to use iarange or otherwise specify independence? What is max_iarange_nesting?

You should use an iarange to specify independence. Basically, you will need to ensure that your batch dimensions are either inside an iarange or are designated as event dims (and hence summed out) via .independent. Refer to the shapes tutorial for more details. You need to specify max_iarange_nesting when using parallel enumeration to tell the Pyro backend what batch dimensions it can use for enumerating over the discrete sites in parallel.

Is num_particles a batchsize?

No. num_particles is the number of samples used for the ELBo estimator. Leaving it as 1 will work reasonably well unless you have non-reparameterized continuous sample sites.

kkyang · July 11, 2018, 3:10am

You need to specify max_iarange_nesting when using parallel enumeration to tell the Pyro backend what batch dimensions it can use for enumerating over the discrete sites in parallel.

So if the batch dimension is 0, then I tell it to make max_iarange_nesting=0?

No. num_particles is the number of samples used for the ELBo estimator. Leaving it as 1 will work reasonably well unless you have non-reparameterized continuous sample sites.

What’s a non-reparameterized continuous sample site? I find that leaving it at 1 makes it do a lot worse, so I assume I have some of these.

neerajprad · July 11, 2018, 6:03am

So if the batch dimension is 0, then I tell it to make max_iarange_nesting=0?

A safe value would be the number of iarange in your model/guide (max). If you don’t have any iarange, you can set it to 0.

What’s a non-reparameterized continuous sample site? I find that leaving it at 1 makes it do a lot worse, so I assume I have some of these.

Its fine to use a value more than 1, and it will only help! I just meant sample sites that draw from a reparameterizable distribution - i.e. where we can specify the sampling operation as a deterministic function of the distribution parameters (+ a source of randomness that doesn’t depend on the parameters), e.g. writing a z ~ Normal(mu, sigma) as z = mu + epsilon * sigma, where epsilon ~ Normal(0, 1). When we need to compute the gradient of the expectation of a function that contains z wrt say the parameters of the distribution from which z is sampled from, this allows us to construct a low variance estimator of the gradient. This is done by all torch distributions in their rsample method, and this will be the default way to sample from distributions in Pyro. When it is not available, we use a noisier estimator called the score function estimator, which would require many more samples (i.e. num_particles) to estimate the gradient. If your PyTorch distributions have an rsample method (distributions in Pyro are just light wrappers over PyTorch distributions), you would be using reparameterized samples underneath. Using higher num_particles just takes more computation, but will only help with convergence (much more so when you are using distributions that don’t have rsample).

kkyang · July 11, 2018, 11:43pm

Sorry, I still can’t seem to figure this out. Here’s what I have for my model and guide:

def hsma_gp_model(X_obs, y_obs, X_acc, y_hyp):
    alpha = pyro.sample('alpha', dist.Gamma(10, 10))
    sigma = pyro.sample('sigma', dist.Gamma(8, 16))
    rho = pyro.sample('rho', dist.Uniform(torch.Tensor([1e-6]), torch.Tensor([4.0])))
    n_acc, _ = X_acc.size()
    probs = torch.ones(n_acc) / n_acc
    X = X_obs
    with pyro.iarange('hyp', len(y_hyp)):
        ind = pyro.sample('ind', dist.Categorical(probs=probs))
        X = torch.cat([X, X_acc[ind:ind+1]])
    y = torch.cat([y_obs, y_hyp])
    # Covariance
    n = len(X)
    cov = Matern52(X, X, alpha, rho) + torch.eye(n) * (sigma + 1e-5)
    L = torch.potrf(cov, upper=False)
    print(L.size())

    # Likelihood
    return pyro.sample('f', dist.MultivariateNormal(torch.zeros(n), scale_tril=L), obs=y)

def hsma_gp_guide(X_obs, y_obs, X_acc, y_hyp):
    aa = pyro.param('aa', torch.Tensor([10]), constraint=constraints.positive)
    ab = pyro.param('ab', torch.Tensor([10]), constraint=constraints.positive)
    ba = pyro.param('ba', torch.Tensor([8]), constraint=constraints.positive)
    bb = pyro.param('bb', torch.Tensor([16]), constraint=constraints.positive)
    ra = pyro.param('ra', torch.Tensor([10]), constraint=constraints.positive)
    rb = pyro.param('rb', torch.Tensor([10]), constraint=constraints.positive)
    n_acc, _ = X_acc.size()
    alpha = pyro.sample('alpha', dist.Gamma(aa, ab))
    sigma = pyro.sample('sigma', dist.Beta(ba, bb))
    rho = pyro.sample('rho', dist.Gamma(ra, rb))
    with pyro.iarange('hyp', len(y_hyp)):
        ps = pyro.param('ps', torch.ones(n_acc) / n_acc, constraint=constraints.simplex)
        ind = pyro.sample('ind', dist.Categorical(probs=ps), infer={'enumerate': 'parallel'})
    return alpha, sigma, rho, inds

It’s supposed to choose rows of X_acc to fill in X during each sample.

How does iarange know how many times it needs to loop?

neerajprad · July 12, 2018, 6:15pm

How does iarange know how many times it needs to loop?

iarange does not loop. It is a simple context manager that will tell SVI that the variables inside the batch are conditionally independent. In your case, the onus is on you to make sure that the leftmost (batch) dim inside your hyp iarange has size (len(hyp))

with pyro.iarange("data", 1000)":
    # expect 1000 independent samples in the batch.
    # i.e., the leftmost "batch" dimension of the sample generated 
    # from  `pyro.sample` inside the iarange should be 1000.
    # How this is achieved is up to the user 
    # e.g. using dist.Normal(0., 1.).expand([1000])
    # to sample a batch of 1000.

Optionally you can can sample a range of indices and use it to index from your data tensor inside the iarange (this might be what you may want to use instead?).

with iarange('data', 100, subsample_size=10) as ind:
    # expect batch of size 10
    obs = sample('obs', dist.Normal(0., 1.), obs=data[ind])

I think the iarange docs do a good job of explaining some of these use cases, but let me know if you still find it confusing. As always, I would suggest getting familiar with these concepts by playing around with toy examples or few liners in ipython, before using it in a bigger model.

If you would like help debugging your particular model, I would suggest adding more details (preferably in a separate post) on what it is that you are looking to do, what you have tried and the exact issues / errors you are getting. My understanding of GPs is limited, but someone else here could help you debug.