This paper describes an application of Automatic Relevance Determination to Gaussian Mixture models using variation inference (CAVI). I’d like to implement a pyro solution using SVI, but I’m troubled by an apparent inconsistency with the GMM tutorial. In the tutorial, all parameters optimized appear to be hyperparameters of the guide, while all pyro ‘sites’ in the model are of the sample() variety.
In the paper, by contrast, the parameters being optimized include the mixture weights, which of course appear in the likelihood function itself, not the guide. From equation (12) of the paper, the guide is a variational posterior only over the component means, variances and class membership vectors, not the weights.
My question is whether pyro-SVI with ELBO will work here; is it required that all optimized parameters be in the guide or can some of them be in the model? (The reason I’m concerned is that the elbo api indicates its implementation follows this source paper, which assumes the likelihood gradient against parameters vanishes identically.)