Thanks, @fehiepsi and @martinjankowiak.
This seems to work in the low dimensional setting (p_dim = 10
), but begins to fail in more realistic settings (for my use case) of p_dim = 1000
when n_dim = 400
. I had tried increasing the number of particles earlier, but not to this extent and my worry is that this may depend on some exponential number of samples between p_dim
and num_particles
to handle the combinatorial nature of the problem.
This code is selecting which of the beta mean/std parameters to select for the posterior based on gamma
for each l_dim
. I reparameterized the guide in terms of parameters wrt each data variable (p_dim
) and then select each for l_dim
using the sampled gamma vector. Indexing would be a bit cleaner, but I ran into JIT errors and just used this product/sum solution which does the same thing.
Using a bit more formal notation, let \text{beta_loc}_{pl} := \mathbb{E}[b_l | \gamma_{lp} = 1, X, y] be the conditional posterior expectation of b_l given that it is coupled with the p th column of X. Since \sum_p \gamma_{lp} = 1 for fixed l, the code jnp.sum(gamma.T * beta_loc, axis=0)
selects the posterior means for each b_l by setting it to the corresponding \text{beta_loc}_{pl} values.
I had an earlier implementation that skipped this step and directly parameterized the l_dim
normal space, but its poor performance led me to try this option. In hindsight, in the CAVI solution, the posterior parameters for beta_l
depend on gamma
, by ‘selecting’ which variable to use for the posterior mean/std, so trying to back out conditional posteriors from each beta_l
shouldn’t work, hence the above formulation.
My understanding of the above plate implementation is that variables are conditionally independent along the l_dim
, but not necessarily the other dimensions (here, each of the l beta_l
values). Is that correct?
Thanks again for your input.