 # Understanding connection between model parameters and logp

Hello,

If I understand correctly, Pyro implements SVI optimization with ELBO loss the following way:

1. Decreasing the estimate of expected log(q(z)) over the variational parameters (phi), which are specified in `pyro.param` statements in the guide function - this term of loss could be obtained using `guide_tr.log_prob_sum()`
2. Increasing estimate of the expected log(p(z|x)) plus the log(p(z)) over the model parameters (theta) - actually `model_tr.log_prob_sum()`

My question is about the theta parameters. Expectably there are none of them in `pyro.get_param_store()` in the case if nothing was explicitly specified in the model function using `pyro.param`. But even in this case `model_tr.log_prob_sum()` is increasing during the training process - which ‘degrees of freedom’ are used for that? What actually is optimized?

Thanks a lot!

@yozhikoff I think that whether `theta` is available or not, `phi` will be optimized to minimize ELBO loss, which is `log q(z) - log p(x, z)`. Though minimizing ELBO loss does not guarantee the increasing of `log p(x, z)`, they are somehow related (if `a = b + c`, increasing `a` will likely increase `c`). By optimizing `phi`, `z` will move to better areas, which in turn (likely) increases the joint probability `p(x, z)`. However, it is not necessary that `p(x, z)` will always increase, for two reasons:

• ELBO loss is stochastic (at least, the term `p(x|z)` is computed using a sample `z` from `q(z)`)
• Increasing `a` might not increase `c`. For example, forgetting about `x` for simplicity and considering `q(z) = N(1, phi)` and `p(z) = N(0, 1)`, while optimizing, `phi` will move the the minimal point of KL(q,p). At this minimal point `phi_0`, each svi step will generate a random sample `z` from `N(1, phi_0)`. There is no guarantee that this random sample `z` will be decreased at later steps (maximizing `p(z)` is equivalent to decreasing `z` to `0`).
1 Like

@fehiepsi Thanks!
Does it mean that in the case when we are not sure about model priors it is always reasonable to parametrize priors using `pyro.param`?
I mean something like

``````a = pyro.param('a', torch.tensor(5.))
sample = pyro.sample('sample', pyro.distributions.Exponential(a))
``````

in order to let the VI optimize `model_logp`?

To be honest, I rarely use `param` in a Bayesian `model` (unless I am working with `nn.Module`). For a hyperparameter like `a`, I will set a prior (hyperprior) for it and define a guide for `a` (e.g. we can use the simplest Delta guide for `a`, which is equivalent to doing maximum likelihood).

Thanks a lot, now it’s more clear to me.
The thing that confused me was that in the absence of model parameters `model_loss` could not be decreased otherwise than by sampling better latent and observed variables from the guide, but it seems that the latter in combination with `phi` optimization is usually enough for good convergence.