Let me return to the first question.
The situation is ecological inference for voting problems. That is to say, the observations consist of the row and column sums for P different RxC matrices, and the quantities of interest are the non-negative cell values in those matrices, representing the number of votes from racial group r for candidate c in precinct p. We are allowing non-integer vote totals.
As usual, the model is programmed without reference to the observations. That is to say, although the number of voters of each racial group in each precinct is taken to be known a priori, their voting behavior is not constrained in the model. Thus, there are PRC different quantities of interest — a PRC-dimensional space. Furthermore, these are sampled using a separate distribution over candidates for each racial group in each precinct; PR different continuous-multinomial distributions over C variables each.
But we are using the observations of the column/candidate totals to build the model. This restricts the valid values to a polytope in a P(R-1)(C-1)-dimensional subspace of the original PRC-dimensional space.
For each precinct, we can create a transform between an unconstrained (R-1)(C-1)-dimensional space, and the (R-1)(C-1)-dimensional subspace embedded of valid values conditional on the observations, where the latter is embedded in the full RC-dimensional space. But this transform is not, immediately, suitable for use in a TransformedDistribution, for a few reasons.
- In the model, these values actually correspond to R separate multinomial distributions.
- Once you set (R-1)(C-1) values, the other R+C-1 of the values are fully determined, not sampled.
If it’s the only way to do this, I can get around problem 1 by creating a BunchOfMultinomials distribution and then a TransformedBunchOfMultinomials distribution, though this seems like a hack to me. And I can get around problem 2 by … well, I don’t actually know how to make a Pyro guide that “samples” deterministically, but obviously there is some way to do so, given that you have MAP guides as an option, so I copy whatever is done there.
Still, all this seems unnecessary. When stochastically estimating the ELBO, you need 3 things:
- a sample from the fitted guide
- the density of the fitted guide at that sample
- the density of the model at that sample, conditional on the observations
If you could do 1&2 simultaneously, you could sample from one distribution then apply arbitrary transforms and adjust with the Jacobian of those transforms. You wouldn’t have to worry about ever inverting those transforms, so you wouldn’t need fully-functional TransformedDistributions.
But from glancing at the Trace_ELBO code, it appears that pyro is doing 1 and 2 separately; using similar logic for 2 as for 3. This means that you need to be able to invert transforms. Furthermore, the way it’s coded seems to mean that a given transform can only apply to one distribution/call-to-sample
at once, even though this is an arbitrary design constraint, not one that’s required logically by the process of SVI.
So my new question is: do I, in fact, have to code a BunchOfMultinomials distribution in order to get this to work, or can anyone suggest a less tedious way?