Implementation of intractable conditional likelihood

ciaobladoo · December 21, 2018, 3:23am

I need to implement a model where the conditional likelihood itself is a intractable integral (this is essential to my problem), that is,
p(x|z) = int p(x|r)p(r|z)dr where int represent the integral. And in my problem, this is intractable.

I was discussing this with @fehiepsi in another post the other day. The most straightforward idea I guess is to use a MC estimator to estimate the likelihood itself. Say we will use 10 samples of r ~ p(r|z) to estimate the likelihood, I have been implementing it like this,
first define an unscaled model, where r ~ p(r|z) is sampled using torch.distributions rather than pyro.sample so it would not be treated like a latent variable. When writing the pyro.sample statement for the observable, I expand the data in a new axis to 10, and basically calculate sum_r p(x|r) as a bulk. After that, use poutine.scale to scale it to 0.1.

Today however I realize this is wrong, the mistake I made is that when q(z) is reparameterized, we need to calculate d log p(x|z)/dz, and the gradient of expectation is of course, not the expectation of the gradient. In fact, the expectation of the gradient in this case, using a MC estimator for the likelihood, E[d log p(x|r)/dz] is zero, certainly not what I want.

In addition to that, MC estimator is actually not a good idea for estimating that likelihood in my problem, I would rather, say use numerical integration for this likelihood. However, I am not sure in this case how can I write the pyro.sample statement for the observable. Essentially, I think this requires defining a type of distribution where the .log_prob are allowed to be computed through numerical approximation.

Any suggestions on how to implement these things?

Thank you!

martinjankowiak · December 21, 2018, 4:18pm

how high dimensional is r?

the problem is that the elbo is constructed with log probabilities so if you use a naive MC estimator of the prob and take its log you’ll end up with a biased estimator (the prob estimator will be unbiased, but the log prob estimator will be biased). if it’s tractable, it’s probably preferable to integrate out r numerically, as you say (e.g. using quadrature).

ciaobladoo · December 21, 2018, 4:37pm

yes, the dimention is 2 and I would prefer to do numerical intergration. But I am not really sure how to do that, because I am not sure how to write pyro.sample statement for observables in this case. And as I said I feel the way to do it is to have a new type of distribution where the log_prob is calculated by numerical integration (or any customed method). Could you give me some advices on that? Thanks.

martinjankowiak · December 21, 2018, 4:41pm

create a custom distribution, e.g. see here

https://github.com/uber/pyro/blob/dev/pyro/distributions/von_mises.py

since you’ll only use it in an observe statement, you only need to implement log_prob (no sample).

fehiepsi · December 22, 2018, 5:41am

@ciaobladoo It would be nice to have a custom distribution to do this job. It will be simpler to assume p(r|z) is Gaussian so we can use Gaussian-Hermit quadrature. I expect this custom distribution will contain a base_distribution (p(x|r)), arguments of p(r|z) (loc and scale for Gaussian case), response_function (to link the output of Gaussian to the support of base_distribution, which is identity if both are real). If you come up with an implementation, would you mind sharing it? I’m waiting to use it (e.g. in Gaussian Process) and happy to review your implementation. Thanks!

ciaobladoo · December 22, 2018, 7:15am

Cool. Thanks for the suggestions. I probably will start implementation next week and I would like to make this as general as possible too. It would be nice to contribute to pyro. Cheers.