In part I of the SVI examples, in the model learning section, the way the documentation is versed, it sounds like between the model parameters, the observations, and the latent variables, we are aiming at finding the maximum likelihood value of the model parameters, and the distribution of the latent variables. Isn’t in variational inference, we are learning the distribution of both the model parameters and the latent variables? And at the end of the section, the documentation states
Variational inference offers a scheme for finding θmax and computing an approximation to the posterior pθmax(z|x).
but shouldn’t we be optimising the variational parameters phi, which is introduced in the guide? Am I missing something? This reads more like the EM algorithm rather than VI.
Also in the guide section the doc states
The basic idea is that we introduce a parameterized distribution qϕ(z), where ϕ are known as the variational parameters.
This makes it feel like the variational parameters are only used to parametrise the latent variables, but not the model parameters. But shouldn’t it be both? For example in the intro
scale example, we have no latent variables, and the variational parameters are used to parametrise the posterior distribution of the mean and the deviation, which are model parameters.