Variational ecological inference


A collaborator and I are working on coding a model/guide for ecological inference. I’m experienced with python, but am having some initial questions about how to do this in pyro:

My first question has to do with the fact that the natural parameter space for the guide is not the same as that of the model. In fact, because the likelihood is an indicator function (do the latents agree with observations?), I’m building a guide conditional on the observations, and not using observations at all on the model side. Thus, the natural parameter space of the guide is lower-dimensional than that of the model. I know that this is not the normal way to do VSI, but I am confident that it works mathematically; my question is about how to program this in practice. I can write the modelspace-to-guidespace mapping function in either direction, and I can calculate its pseudo-jacobian (the absolute value of the product of its nonzero eigenvalues). Which side should I apply it on? How do I ensure that pyro uses the correct pseudo-jacobian, rather than the actual jacobian which is 0?

  1. When I have a bunch of latent parameters of the same type, is it always the right move to index them over a plate, or is there a case for sometimes sampling a single tensor with all of them? If the latter is better, how would I do it?

I realize these are both somewhat beginner questions — that if I were more familiar with this package, I’d probably know what to do more obviously. So, I’m especially grateful for any helpful responses.



On the first question: you might want to implement your model and guide components as torch.distributions.TransformedDistributions with custom Transforms. Have a look at the source code of some of the built-in Transforms in torch.distributions (e.g. AbsTransform) for examples.

On the second question: plates provide extra information about conditional independence to inference algorithms. Without access to this information, inference may sometimes be significantly slower computationally or statistically, so there’s no reason to withhold it if it’s correct.

Note that the more narrow and detailed you can make your questions, ideally including runnable code or math illustrating the issue, the more helpful we can be.


Let me return to the first question.

The situation is ecological inference for voting problems. That is to say, the observations consist of the row and column sums for P different RxC matrices, and the quantities of interest are the non-negative cell values in those matrices, representing the number of votes from racial group r for candidate c in precinct p. We are allowing non-integer vote totals.

As usual, the model is programmed without reference to the observations. That is to say, although the number of voters of each racial group in each precinct is taken to be known a priori, their voting behavior is not constrained in the model. Thus, there are PRC different quantities of interest — a PRC-dimensional space. Furthermore, these are sampled using a separate distribution over candidates for each racial group in each precinct; PR different continuous-multinomial distributions over C variables each.

But we are using the observations of the column/candidate totals to build the model. This restricts the valid values to a polytope in a P(R-1)(C-1)-dimensional subspace of the original PRC-dimensional space.

For each precinct, we can create a transform between an unconstrained (R-1)(C-1)-dimensional space, and the (R-1)(C-1)-dimensional subspace embedded of valid values conditional on the observations, where the latter is embedded in the full RC-dimensional space. But this transform is not, immediately, suitable for use in a TransformedDistribution, for a few reasons.

  1. In the model, these values actually correspond to R separate multinomial distributions.
  2. Once you set (R-1)(C-1) values, the other R+C-1 of the values are fully determined, not sampled.

If it’s the only way to do this, I can get around problem 1 by creating a BunchOfMultinomials distribution and then a TransformedBunchOfMultinomials distribution, though this seems like a hack to me. And I can get around problem 2 by … well, I don’t actually know how to make a Pyro guide that “samples” deterministically, but obviously there is some way to do so, given that you have MAP guides as an option, so I copy whatever is done there.

Still, all this seems unnecessary. When stochastically estimating the ELBO, you need 3 things:

  1. a sample from the fitted guide
  2. the density of the fitted guide at that sample
  3. the density of the model at that sample, conditional on the observations
    If you could do 1&2 simultaneously, you could sample from one distribution then apply arbitrary transforms and adjust with the Jacobian of those transforms. You wouldn’t have to worry about ever inverting those transforms, so you wouldn’t need fully-functional TransformedDistributions.

But from glancing at the Trace_ELBO code, it appears that pyro is doing 1 and 2 separately; using similar logic for 2 as for 3. This means that you need to be able to invert transforms. Furthermore, the way it’s coded seems to mean that a given transform can only apply to one distribution/call-to-sample at once, even though this is an arbitrary design constraint, not one that’s required logically by the process of SVI.

So my new question is: do I, in fact, have to code a BunchOfMultinomials distribution in order to get this to work, or can anyone suggest a less tedious way?


(following up after offline discussion with @Jameson)

I believe the crux of variational ecological inference is to create a guide sample that is strictly lower dimensional than the model distribution. You can accomplish this using a low-dimensional auxiliary site in the guide and then injecting this into a full-dimensional Delta site in the guide (taking care to include the injection’s log(abs(det(Jacobian))) in the Delta distribution’s log_density parameter). Pyro’s AutoContinuous autoguide uses this aux+delta pattern.

Here’s an example colab notebook of variational ecological inference, roughly following this paper.