I would like to do model selection based on principle of parsimony. The model is a linear differential equation. Upfront I set a `max_order`

that I want to explore. I have a discrete variable that samples the current order of the differential equation that I want to simulate. However in order to have the effect of parsimony in the ELBO I should mask the parameters that are not used.

I have three sets of `pyro.sample`

`a`

, `b`

, and `c`

. All of them vectors with shape `1 x max_dim`

.

When I get a sample N`<max_dim`

i need to mask `N-max_dim`

samples.

The issue is that the selection of the parameters that are used in the simulation for a particular order N is not that simple. The straightforward approach is to take the first N parameters and in such a way, the mask setting is simple. First sample N and then use `poutine.mask`

when sampling `a`

, `b`

, and `c`

, to select the first N (mask all the rest).

However the selection of parameters is done based on a relation among `a`

, `b`

, and `c`

. So I need to get the samples first, then select which N values should be used. But since I am not masking the rest, the `ELBO`

always takes into account `max_dim`

values for `a`

, `b`

, and `c`

and the parsimony principle does not take effect.

I am thinking of overloading `Trace`

so that I can alter `scale_and_mask`

but I was wondering if there is another more elegant solution.

In a nutshell a simple (pseudo)code would be:

```
def model(self):
N_par = pyro.param('N_par', dist.Dirichlet(torch.ones(self.max_dim).to(self.device)*2500),
constraint=constraints.simplex)
pole = pyro.sample('pole', dist.Categorical(N_par))
with pyro.plate('dims', self.max_dim):
a = pyro.sample('a', ...)
b = pyro.sample('b', ...)
c = pyro.sample('c', ...)
sel_ind = select_which_parameters(a,b,c,pole)
sim_mu = simulate(a,b,c,sel_ind)
# Problematic part
mask_somehow(a,b,c, mask=sel_ind>0) # this should have the same effect as poutine.mask
```