If I modify the model to use the distribution we discussed before in this thread: Mixture model with discrete data in Numpyro - #8 by jim (assuming independent Bernoullis) then I don’t get the label switching behaviour and all the model parameters are estimated pretty well. The change to this custom distribution does take a bit of modification to the model, as the parameters of the independent case are simple probabilities that I’ve used Beta priors and sampling for them, but the case here has a constraint that the parameters are a simplex so I need to use something like a Dirichlet prior instead.
Even without the label switching here, the model looks like it would still be dumping all observations in the same cluster, which is not what I want.
Thanks.