Samples of a Bernoulli random variable have more than one element during MCMC

billtubbs · January 25, 2021, 9:32pm

I’m new to Pyro and looking for a package that can help me simulate (and possibly do other things like inference with) stochastic processes that have deterministic or discrete elements. Pyro seems to be a good solution and I’m just trying to get a simple ‘MVP’ demonstration to work.

I initially posted a question on stackoverflow but got no responses.

Should I be posting these kind of beginner questions here or on stackoverflow?

I notice that the tag ‘Pyro’ is already being used on stack overflow for some other package.

Please advise. If this forum is the right place I will re-post the question here, otherwise, please go to the link above and help answer my question!

thanks.

fehiepsi · January 26, 2021, 3:16am

Hi Bill, to my knowledge, none of Pyro devs is active on stackoverflow. We use this forum to discuss about ideas and questions related to Pyro. If you have questions, please ask here. Welcome!

billtubbs · January 26, 2021, 3:36am

Thanks @fehiepsi. I’ll post my question here then.

I’m trying to get my first stochastic process model working. I adapted the code from here to suit my example problem which is simply two Gaussians with a discrete probability of the sample coming from one or the other. Right now, I’m just trying to do something simple like estimate the probability distribution of the output of this process.

Here’s my code:

import torch
import pyro
import pyro.distributions as dist
from pyro.infer.mcmc import HMC, MCMC

# Actual data sample
observations = torch.tensor(
    [0.00528813, -0.00589001, -1.20608593, 0.00190794,
     0.89052784,  0.66690464,  0.57295968, 0.02605967]
)

# Define the process
def model(observations):
    
    a_prior = dist.Beta(2, 2)
    a = pyro.sample("a", a_prior)
    c = pyro.sample('c', dist.Bernoulli(a))
    if c.item() == 1.0:
        my_dist = dist.Normal(0.785, 1.0)
    else:
        my_dist = dist.Normal(0.0, 0.01)
    
    for i, observation in enumerate(observations):
        measurement = pyro.sample(f'obs_{i}', my_dist, obs=observation)

# Clear parameters
pyro.clear_param_store()

# Define the MCMC kernel function
my_kernel = HMC(model)

# Define the MCMC algorithm
my_mcmc = MCMC(my_kernel,
               num_samples=5000,
               warmup_steps=50)

# Run the algorithm, passing the observations 
my_mcmc.run(observations)

The exception raised is:

<ipython-input-2-a668622a0fb9> in model(observations)
     11     a = pyro.sample("a", a_prior)
     12     c = pyro.sample('c', dist.Bernoulli(a))
---> 13     if c.item() == 1.0:
     14         my_dist = dist.Normal(0.785, 1.0)
     15     else:

ValueError: only one element tensors can be converted to Python scalars
Trace Shapes:    
 Param Sites:    
Sample Sites:    
       a dist   |
        value   |
       c dist   |
        value 2 |

I had a look at c using the debugger and for some reason it has two elements the second time model() is called:

tensor([0., 1.])

What is causing this? I wanted it to be a simple scalar having the values 0 or 1.

As a further test, the condition statement works fine when taking samples in the normal way:

# Conditional switch test
a_prior = dist.Beta(2, 2)
a = pyro.sample("a", a_prior)
for i in range(5):
    c = pyro.sample('c', dist.Bernoulli(a))
    if c.item() == 1.0:
        print(1, end=' ')
    else:
        print(0, end=' ')

# 0 0 1 0 0

fehiepsi · January 26, 2021, 3:51am

Hi @billtubbs, HMC only works for continuous variables, so we have to integrate out the discrete latent variable. You can write your model as follows

    c = c.long()
    my_dist = dist.Normal(torch.tensor([0.785, 0.0])[c], torch.tensor([1.0, 0.01])[c])
    
    with pyro.plate("N", observations.shape[0]):
        measurement = pyro.sample('obs', my_dist, obs=observations)

billtubbs · January 26, 2021, 4:30am

Thanks. I will have to read that documentation link in full so I can understand how it works but it does work now!

billtubbs · November 9, 2021, 11:47pm

Sorry for the delay, but I am back working on this now. Could someone please give me some advice on the next steps?

Following the advice of @ fehiepsi above, I think I have implemented the model function correctly now and conditioned it on a set of observations (I modified it slightly so that initial guesses for two of the parameters can be provided):

import torch
import pyro
import pyro.distributions as dist

def random_shocks(scale_guess, b_guess):
    scale = pyro.sample("scale", dist.Gamma(2, scale_guess))  # always positive
    b = pyro.sample("b", dist.Gamma(2, b_guess))  # always positive
    epsilon = pyro.param("epsilon", torch.tensor(0.01), constraint=constraints.interval(0, 1))
    alpha = pyro.sample("alpha", dist.Bernoulli(epsilon))
    alpha = alpha.long()
    wp_dist = dist.Normal(torch.tensor([0.0, 0.0])[alpha], torch.tensor([scale, scale*b])[alpha])
    measurement = pyro.sample('obs', wp_dist)
    return measurement

def random_shocks_conditioned(scale_guess, b_guess, observations):
    scale = pyro.sample("scale", dist.Gamma(2, scale_guess))  # always positive
    b = pyro.sample("b", dist.Gamma(2, b_guess))  # always positive
    epsilon = pyro.param("epsilon", torch.tensor(0.01), constraint=constraints.interval(0, 1))
    alpha = pyro.sample("alpha", dist.Bernoulli(epsilon))
    alpha = alpha.long()
    wp_dist = dist.Normal(torch.tensor([0.0, 0.0])[alpha], torch.tensor([scale, scale*b])[alpha])
    with pyro.plate("N", observations.shape[0]):
        measurement = pyro.sample('obs', wp_dist, obs=observations)

My goal is to estimate the ‘scale’, ‘b’, and ‘epsilon’ parameters, and if possible, the sequence of binary random numbers alpha, given a sequence of observations. Is this possible with variational inference?

If I understand correctly, the next step is to create a parameterized guide function. In this case, I am thinking a Gaussian mixture distribution (with two components) would do the trick.

Which inference algorithm is appropriate here and is there an appropriate example script I could follow?

billtubbs · November 9, 2021, 11:51pm

For reference, here is the result of sampling from the model. It seems to produce the expected distribution:

wp_samples = [random_shocks(0.01, 100).item() for _ in range(1000)]

plt.hist(wp_samples, bins=61)
plt.xlabel('Randomly occuring shocks')
plt.ylabel('Frequency')
plt.grid()
plt.savefig('plots/wp_samples.png')
plt.show()

wp_samples

billtubbs · November 10, 2021, 4:45am

Okay, I have made some progress on my own. Based on the example here I constructed a 2 component mixture model guide function. The difference here is that this model allows different scale parameters for each mixture.

K = 2  # Number of mixture components (fixed)

@config_enumerate
def model(data):
    # Global variables.
    weights = pyro.sample('weights', dist.Dirichlet(0.5 * torch.ones(K)))
    with pyro.plate('components', K):
        scales = pyro.sample('scales', dist.LogNormal(0., 2.))
        locs = pyro.sample('locs', dist.Normal(0., 10.))

    with pyro.plate('data', len(data)):
        # Local variables.
        assignment = pyro.sample('assignment', dist.Categorical(weights))
        pyro.sample('obs', dist.Normal(locs[assignment], scales[assignment]), obs=data)

So I think I can solve this using the GMM SVI example in the documentation. Please let me know if I’m not on the right track here, but I will try this and see if it works for this problem.

billtubbs · November 10, 2021, 5:12pm

Success! It correctly estimates the parameters of the original data-generating distribution:

# Parameters
true_scale = 0.01
true_b = 100
true_epsilon = 0.01

# Generate data set
wp_measurements = [random_shocks(true_scale, true_b).item() for _ in range(1000)]

# Optimizer
optim = pyro.optim.Adam({'lr': 0.1, 'betas': [0.8, 0.99]})
elbo = TraceEnum_ELBO(max_plate_nesting=1)

# Initialization
def init_loc_fn(site):
    if site["name"] == "weights":
        # Initialize weights to uniform.
        return torch.ones(K) / K
    if site["name"] == "scales":
        # Initialise std. dev's. to std. dev of data
        return (data.var() * torch.ones(K) / 2).sqrt()
    if site["name"] == "locs":
        # Initialise means to two randomly chosen data points
        return data[torch.multinomial(torch.ones(len(data)) / len(data), K)]
    raise ValueError(site["name"])

def initialize(seed):
    global global_guide, svi
    pyro.set_rng_seed(seed)
    pyro.clear_param_store()
    global_guide = AutoDelta(pyro.poutine.block(model, expose=['weights', 'locs', 'scales']),
                             init_loc_fn=init_loc_fn)
    svi = SVI(model, global_guide, optim, loss=elbo)
    return svi.loss(model, global_guide, data)

# Choose the best among 100 random initializations.
loss, seed = min((initialize(seed), seed) for seed in range(100))
initialize(seed)
print('seed = {}, initial_loss = {}'.format(seed, loss))

# Training

# Register hooks to monitor gradient norms.
gradient_norms = defaultdict(list)
for name, value in pyro.get_param_store().named_parameters():
    value.register_hook(lambda g, name=name: gradient_norms[name].append(g.norm().item()))

losses = []
for i in range(200 if not smoke_test else 2):
    loss = svi.step(data)
    losses.append(loss)
    print('.' if i % 100 else '\n', end='')

map_estimates = global_guide(data)
weights = map_estimates['weights']
locs = map_estimates['locs']
scales = map_estimates['scales']

# Print parameter estimates
print(f"{'Estimates':29s} {'True Parameters':29s} ")
print("-"*80)
print(f"weights = {weights.data.numpy().round(3)}       {true_epsilon = }")
print(f"locs = {locs.data.numpy().round(3)}        true_mean = 0")
true_scales = np.array([true_scale, true_b*true_scale])
print(f"scales = {scales.data.numpy().round(3)}       {true_scales = }")

Output

seed = 1, initial_loss = -534.2420654296875

...................................................................................................
...................................................................................................

Estimates                     True Parameters               
--------------------------------------------------------------------------------
weights = [0.01 0.99]       true_epsilon = 0.01
locs = [-0.06  -0.009]        true_mean = 0
scales = [1.111 0.017]       true_scales = array([0.01, 1.  ])

The only strange thing is the loss is negative. Is this normal?

martinjankowiak · November 10, 2021, 8:29pm

the elbo can generally be any sign.

billtubbs · November 10, 2021, 8:39pm

Ah, okay thanks. So nothing to worry about.