Discrete Inference with TraceGraph_ELBO

Thanks, @fehiepsi and @martinjankowiak.

This seems to work in the low dimensional setting (p_dim = 10), but begins to fail in more realistic settings (for my use case) of p_dim = 1000 when n_dim = 400. I had tried increasing the number of particles earlier, but not to this extent and my worry is that this may depend on some exponential number of samples between p_dim and num_particles to handle the combinatorial nature of the problem.

This code is selecting which of the beta mean/std parameters to select for the posterior based on gamma for each l_dim. I reparameterized the guide in terms of parameters wrt each data variable (p_dim) and then select each for l_dim using the sampled gamma vector. Indexing would be a bit cleaner, but I ran into JIT errors and just used this product/sum solution which does the same thing.

Using a bit more formal notation, let \text{beta_loc}_{pl} := \mathbb{E}[b_l | \gamma_{lp} = 1, X, y] be the conditional posterior expectation of b_l given that it is coupled with the p th column of X. Since \sum_p \gamma_{lp} = 1 for fixed l, the code jnp.sum(gamma.T * beta_loc, axis=0) selects the posterior means for each b_l by setting it to the corresponding \text{beta_loc}_{pl} values.

I had an earlier implementation that skipped this step and directly parameterized the l_dim normal space, but its poor performance led me to try this option. In hindsight, in the CAVI solution, the posterior parameters for beta_l depend on gamma, by ‘selecting’ which variable to use for the posterior mean/std, so trying to back out conditional posteriors from each beta_l shouldn’t work, hence the above formulation.

My understanding of the above plate implementation is that variables are conditionally independent along the l_dim, but not necessarily the other dimensions (here, each of the l beta_l values). Is that correct?

Thanks again for your input.