How to implement tempered model in (Num)Pyro

xidulu · November 16, 2020, 12:55pm

Hi

I am wondering how can I have a tempered posterior using Pyro/Numpyro.
For example: The origin model is p(w, z) = p(w | z) p(z), and I would like to change the likelihood term to p(w | z)^T.

This tempering operation similar to the KL annealing trick adopted in VAE and the cold posterior trick used in BNN training.

Thanks

martinjankowiak · November 16, 2020, 5:53pm

if the temperature is a fixed parameter that you set then you can simply use poutine.scale:

with pyro.poutine.scale(scale=T):
    pyro.sample("w", ...)

this multiplies the enclosed log_probs by T

xidulu · November 17, 2020, 3:15am

What if I want to treat temperature as a local data-dependent latent variable, (e.g. [1411.1810] Variational Tempering), can poutine.scale still help? Or I need to use other methods.

Thanks

martinjankowiak · November 17, 2020, 4:01am

i’m not sure it depends. certainly scale can be local. but whether or not it can be treated as e.g. latent will depend. can you be (much) more specific as to what you want to do? e.g. point to a specific objective function and corresponding inference algorithm?

xidulu · November 17, 2020, 4:15am

For example, a model like this, in which T_i stands for the temperature

And now I would like to perform MCMC on this model.

xidulu · November 17, 2020, 7:11am

Problem solved, I implement a wrapper class of distributions, which would return a scaled version of log likelihood:

class tempered_XXX(dist.XXX):
    def __init__(self, T, *args, **kwargs):
        self.T = T
        super().__init__(*args, **kwargs)
    
    def log_prob(self, value):
        return 1 / self.T * super().log_prob(value)

fehiepsi · November 17, 2020, 6:36pm

return 1 / self.T * super().log_prob(value)

This is what scale does. Probably the documentation part scale (float) is not clear and makes you think we need a scalar there. We should change it to: scale (float or ndarray) and mention that its shape should be broadcast-able to the batch shape (i.e. log_prob shape) of each site under its context. If this does what you wanted, could you open a PR to enhance the docs.

xidulu · November 18, 2020, 3:58am

Sure I can do that.

ChernovAndrey · July 19, 2021, 4:00pm

Hi

I am sorry to start this conversation again. I will be appreciated If you will clarify two points for me.

Is there any differences between the two realizations:

 with numpyro.plate('data', len(data)), scale(scale=T)
        numpyro.sample('obs', dist.Normal(locs, sigma), obs=data)

and

nuts_kernel = NUTS(scale(model, scale=T))

And If I use both, will I multiply the log-likelihood to T^2 or only T?

The scale wrapper multiplies the log-likelihood to scale (not to 1/scale), am I right?

Thank You!

fehiepsi · July 19, 2021, 5:11pm

Hi @ChernovAndrey, the first scale only scales the likelihood while the second one scales all sample sites in the model. If you use both, the likelihood will be scaled by T^2. You are right about the second question.

ChernovAndrey · July 19, 2021, 5:26pm

So the second one scales prior distribution too, am I right?

fehiepsi · July 19, 2021, 6:09pm

Yes, you are right.