[Newbie] How to get the loss value between every sample of a MCMC

snow · August 1, 2019, 8:29pm

Hello, I am totally new to Pyro and I am not sure to understand how I can get the loss value between every sample of a MCMC.

Also, I’m not quite sure why the function model is called multiple times (often many more than the number of samples).

Thanks for your attention.

eb8680_2 · August 3, 2019, 2:28am

get the loss value between every sample of a MCMC

Hi, can you clarify what you mean by loss value? Do you mean the unnormalized probability of each sample?

why the function model is called multiple times

In Metropolis-Hastings-based MCMC algorithms, proposed samples are sometimes rejected, meaning that the unnormalized density might need to be evaluated multiple times to produce a single sample. See Chapter 3 of Bayesian Methods for Hackers for an intuitive introduction to MCMC.

fehiepsi · August 3, 2019, 4:43am

If you want to get potential energy of the model given a collection of samples, then I think the fastest way is to use predictive utility. For example, consider a logistic regression model

import torch
import pyro
import pyro.distributions as dist
from pyro.infer.mcmc.api import MCMC
from pyro.infer.mcmc.nuts import NUTS

dim = 3
data = torch.randn(2000, dim)
true_coefs = torch.arange(1., dim + 1.)
labels = dist.Bernoulli(logits=(true_coefs * data).sum(-1)).sample()

def model(data):
    with pyro.plate('plate_beta', dim):
        coefs = pyro.sample('beta', dist.Normal(0., 1.))
    with pyro.plate('plate_y'):
        y = pyro.sample('y', dist.Bernoulli(logits=coefs.matmul(data.t())), obs=labels)
    return y

nuts_kernel = NUTS(model, jit_compile=True, ignore_jit_warnings=True)
mcmc = MCMC(nuts_kernel, num_samples=500, warmup_steps=100)
mcmc.run(data)
samples = mcmc.get_samples()

then you can get log probabilities as follows

from pyro.infer.mcmc.util import predictive
from pyro.distributions.util import sum_rightmost

trace = predictive(model, samples, data, return_trace=True)
trace.compute_log_prob()
pe = 0.
for site in trace.nodes.values():
    if site['type'] == 'sample':
        pe = pe - sum_rightmost(site['log_prob'], -1)
print(pe)

However, you have to write correct plate statements in addition to using coefs.matmul(data.t())) instead of (coefs * data).sum(-1).

Otherwise, you can use the slower way: compute potential energy for each sample

pe = []
for i in range(500):
    sample = {k: v[i] for k, v in samples.items()}
    # transform sample to unconstrained space
    for k, transform in mcmc.transforms.items():
        sample[k] = transform(sample[k])
    pe.append(mcmc.kernel.potential_fn(sample))

Alternative, you can use numpyro if you want to record many information such as potential energy, num_steps, accept_prob, step_size, inverse_mass_matrix,…

snow · August 5, 2019, 5:16pm

Thanks for your answers, it’s very appreciated!

Hi, can you clarify what you mean by loss value? Do you mean the unnormalized probability of each sample?

I may be wrong in my understanding of the MCMC, but I will try to explain my thinking. Let’s consider that my model is a neural network.

For each sample of the MCMC:

The neural network (NN) makes a forward pass to calculate the loss on the examples;
The current NN parameters and loss are then transferred to the MCMC;
The MCMC calculates new parameters for the NN;

So what I want to plot is the computed loss for each sample.

In Metropolis-Hastings-based MCMC algorithms, proposed samples are sometimes rejected, meaning that the unnormalized density might need to be evaluated multiple times to produce a single sample. See Chapter 3 of Bayesian Methods for Hackers for an intuitive introduction to MCMC.

I thought the rejected samples were counted as samples since they convey the information that the MCMC could be in a high probability area.

Thanks for your answers, it’s very appreciated!

Hi, can you clarify what you mean by loss value? Do you mean the unnormalized probability of each sample?

I may be wrong in my understanding of the MCMC, but I will try to explain my thinking. Let’s consider that my model is a neural network.

For each sample of the MCMC:

The neural network (NN) makes a forward pass to calculate the loss on the examples;
The current NN parameters and loss are then transferred to the MCMC;
The MCMC calculates new parameters for the NN;

So what I want to plot is the computed loss for each sample. Does it make any sense?

In Metropolis-Hastings-based MCMC algorithms, proposed samples are sometimes rejected, meaning that the unnormalized density might need to be evaluated multiple times to produce a single sample. See Chapter 3 of Bayesian Methods for Hackers for an intuitive introduction to MCMC.

I thought the rejected samples were counted as samples since they convey the information that the MCMC could be in a high probability area.

If you want to get potential energy of the model given a collection of samples, then I think the fastest way is to use predictive.

I think your answer brings me closer to what I want.