I am wondering how can I have a tempered posterior using Pyro/Numpyro.
For example: The origin model is p(w, z) = p(w | z) p(z), and I would like to change the likelihood term to p(w | z)^T.
This tempering operation similar to the KL annealing trick adopted in VAE and the cold posterior trick used in BNN training.
What if I want to treat temperature as a local data-dependent latent variable, (e.g. [1411.1810] Variational Tempering), can poutine.scale still help? Or I need to use other methods.
i’m not sure it depends. certainly scale can be local. but whether or not it can be treated as e.g. latent will depend. can you be (much) more specific as to what you want to do? e.g. point to a specific objective function and corresponding inference algorithm?
This is what scale does. Probably the documentation part scale (float) is not clear and makes you think we need a scalar there. We should change it to: scale (float or ndarray) and mention that its shape should be broadcast-able to the batch shape (i.e. log_prob shape) of each site under its context. If this does what you wanted, could you open a PR to enhance the docs.
Hi @ChernovAndrey, the first scale only scales the likelihood while the second one scales all sample sites in the model. If you use both, the likelihood will be scaled by T^2. You are right about the second question.