[Newbie] How to get the loss value between every sample of a MCMC

Hello, I am totally new to Pyro and I am not sure to understand how I can get the loss value between every sample of a MCMC.

Also, I’m not quite sure why the function model is called multiple times (often many more than the number of samples).

Thanks for your attention.

get the loss value between every sample of a MCMC

Hi, can you clarify what you mean by loss value? Do you mean the unnormalized probability of each sample?

why the function model is called multiple times

In Metropolis-Hastings-based MCMC algorithms, proposed samples are sometimes rejected, meaning that the unnormalized density might need to be evaluated multiple times to produce a single sample. See Chapter 3 of Bayesian Methods for Hackers for an intuitive introduction to MCMC.

1 Like

If you want to get potential energy of the model given a collection of samples, then I think the fastest way is to use predictive utility. For example, consider a logistic regression model

import torch
import pyro
import pyro.distributions as dist
from pyro.infer.mcmc.api import MCMC
from pyro.infer.mcmc.nuts import NUTS

dim = 3
data = torch.randn(2000, dim)
true_coefs = torch.arange(1., dim + 1.)
labels = dist.Bernoulli(logits=(true_coefs * data).sum(-1)).sample()

def model(data):
    with pyro.plate('plate_beta', dim):
        coefs = pyro.sample('beta', dist.Normal(0., 1.))
    with pyro.plate('plate_y'):
        y = pyro.sample('y', dist.Bernoulli(logits=coefs.matmul(data.t())), obs=labels)
    return y

nuts_kernel = NUTS(model, jit_compile=True, ignore_jit_warnings=True)
mcmc = MCMC(nuts_kernel, num_samples=500, warmup_steps=100)
mcmc.run(data)
samples = mcmc.get_samples()

then you can get log probabilities as follows

from pyro.infer.mcmc.util import predictive
from pyro.distributions.util import sum_rightmost

trace = predictive(model, samples, data, return_trace=True)
trace.compute_log_prob()
pe = 0.
for site in trace.nodes.values():
    if site['type'] == 'sample':
        pe = pe - sum_rightmost(site['log_prob'], -1)
print(pe)

However, you have to write correct plate statements in addition to using coefs.matmul(data.t())) instead of (coefs * data).sum(-1).

Otherwise, you can use the slower way: compute potential energy for each sample

pe = []
for i in range(500):
    sample = {k: v[i] for k, v in samples.items()}
    # transform sample to unconstrained space
    for k, transform in mcmc.transforms.items():
        sample[k] = transform(sample[k])
    pe.append(mcmc.kernel.potential_fn(sample))

Alternative, you can use numpyro if you want to record many information such as potential energy, num_steps, accept_prob, step_size, inverse_mass_matrix,…

2 Likes

Thanks for your answers, it’s very appreciated!

Hi, can you clarify what you mean by loss value? Do you mean the unnormalized probability of each sample?

I may be wrong in my understanding of the MCMC, but I will try to explain my thinking. Let’s consider that my model is a neural network.

For each sample of the MCMC:

  1. The neural network (NN) makes a forward pass to calculate the loss on the examples;
  2. The current NN parameters and loss are then transferred to the MCMC;
  3. The MCMC calculates new parameters for the NN;

So what I want to plot is the computed loss for each sample.

In Metropolis-Hastings-based MCMC algorithms, proposed samples are sometimes rejected, meaning that the unnormalized density might need to be evaluated multiple times to produce a single sample. See Chapter 3 of Bayesian Methods for Hackers for an intuitive introduction to MCMC.

I thought the rejected samples were counted as samples since they convey the information that the MCMC could be in a high probability area.

Thanks for your answers, it’s very appreciated!

Hi, can you clarify what you mean by loss value? Do you mean the unnormalized probability of each sample?

I may be wrong in my understanding of the MCMC, but I will try to explain my thinking. Let’s consider that my model is a neural network.

For each sample of the MCMC:

  1. The neural network (NN) makes a forward pass to calculate the loss on the examples;
  2. The current NN parameters and loss are then transferred to the MCMC;
  3. The MCMC calculates new parameters for the NN;

So what I want to plot is the computed loss for each sample. Does it make any sense?

In Metropolis-Hastings-based MCMC algorithms, proposed samples are sometimes rejected, meaning that the unnormalized density might need to be evaluated multiple times to produce a single sample. See Chapter 3 of Bayesian Methods for Hackers for an intuitive introduction to MCMC.

I thought the rejected samples were counted as samples since they convey the information that the MCMC could be in a high probability area.

If you want to get potential energy of the model given a collection of samples, then I think the fastest way is to use predictive.

I think your answer brings me closer to what I want.