I’m attempting to run the Bayesian regression tutorial below. I’m running this in a Jupyter notebook hosted on an AWS server. I’ve been trying to get MCMC to work when num_chains is greater than 1, on a CPU. I’ve hit the following roadblocks:
- when mp_context is blank or “fork”, I receive
RuntimeError: Unable to handle autograd's threading in combination with fork-based multiprocessing. See https://github.com/pytorch/pytorch/wiki/Autograd-and-Fork
. This led me to try out tother mp_context options. - when mp_context is “forkserver”, four progress bars show up but none ever start. This isn’t just a progress bar issue, as setting num_samples and warmup steps to 1 doesn’t cause them to finish, either.
- when mp_context is “spawn”, I receive
ValueError: bad value(s) in fds_to_keep
.
My sense is that mp_context should be “spawn” in this environment, but I don’t know how to address the fds_to_keep
error – some googling has shown that this occurs sometimes in PyTorch, but I’m not sure how to solve it for the Pyro use case.
import numpy as np
import pandas as pd
import torch
import pyro
import pyro.distributions as dist
from pyro.infer.mcmc import MCMC, NUTS
pyro.enable_validation(True)
pyro.set_rng_seed(1)
DATA_URL = "https://d2hg8soec8ck9v.cloudfront.net/datasets/rugged_data.csv"
rugged_data = pd.read_csv(DATA_URL, encoding="ISO-8859-1")
def model(is_cont_africa, ruggedness, log_gdp):
a = pyro.sample("a", dist.Normal(8., 1000.))
b_a = pyro.sample("bA", dist.Normal(0., 1.))
b_r = pyro.sample("bR", dist.Normal(0., 1.))
b_ar = pyro.sample("bAR", dist.Normal(0., 1.))
sigma = pyro.sample("sigma", dist.Uniform(0., 10.))
mean = a + b_a * is_cont_africa + b_r * ruggedness + b_ar * is_cont_africa * ruggedness
with pyro.iarange("data", len(ruggedness)):
pyro.sample("obs", dist.Normal(mean, sigma), obs=log_gdp)
df = rugged_data[["cont_africa", "rugged", "rgdppc_2000"]]
df = df[np.isfinite(df.rgdppc_2000)]
df["rgdppc_2000"] = np.log(df["rgdppc_2000"])
train = torch.tensor(df.values, dtype=torch.float)
is_cont_africa, ruggedness, log_gdp = train[:, 0], train[:, 1], train[:, 2]
nuts_kernel = NUTS(model, adapt_step_size=True)
hmc_posterior = MCMC(nuts_kernel, num_samples=4000, warmup_steps=1000, num_chains=4, mp_context = "spawn").run(is_cont_africa, ruggedness, log_gdp)