Mixture Distributions in pyro.distributions vs. Discrete Latent Variables

I saw that in addition to discrete latent variables, Pyro also allows you to specify that a variable is distributed as a mixture of distributions, for certain distributions (e.g. MixtureOfDiagNormals.) On the other hand, I get a NotImplementedError whenever I try to give a latent variable this distribution.

My guess is that since there is an correspondence between (i) latent variables that have a mixture distribution, and (ii) compositions of a discrete + continuous latent variables, and since I am using SVI, this is Pyro telling me to use (ii) and TraceEnum_ELBO. I get the same error for NUTS+MCMC.

If the above logic is correct, my question is: is there a use case for MixtureOfDiagNormals (and the mixture distributions in pyro.distributions) that is not covered by discrete latent variables?

MixtureOfDiagNormals is (or at least should be in the absence of bugs) a full-fledged distribution apart from the fact that it has limited support for batching. in particular it has a rsample method and so it can be used as a variational distribution in a guide. this is nice insofar as the resulting elbo estimator should have lower variance than would be the case if you explicitly introduced discrete latent variables.

on the other hand this is one particular distribution and can’t help if you if you want a mixture of some other component distribution (e.g. mixture of weibull distributions). in that case you generally need to introduce explicit latent variables. you then have several options. one is to enumerate them (i.e. sum them out). that’s where TraceEnum_ELBO comes in. alternatively, you can choose to not sum them out, in which case you need to introduce variational distributions in your guide for the discrete latent variables. you can then use e.g. Trace_ELBO but you will likely experience large variance elbo estimators due to the use of so-called score function gradients (as explained in SVI Part III). depending on the dimension and other details of the problem this is either a small annoyance (slows down convergence) or catastrophically bad.

finally NUTS only works on continuous distributions and so the only option there is to sum out any discrete latent variables. this may or may not be tractable depending on the model.