Need help with (very) simple model

wmiller · December 20, 2018, 5:10pm

Hello. I’m new to Pyro and PyTorch. After working through the first few tutorials, I tried to write a very simple program that recovers the mean and standard deviation of a normally distributed toy data set, but it doesn’t compute the correct results. I was hoping that I could get some help.

My data set is just samples from a normal distribution with a mean of 10 and a standard deviation of 3:

def create_data(mu, sd):
	data = torch.zeros(500)
	for i in range(500):
		data[i] = torch.distributions.Normal(mu, sd).sample()
	return data

data = create_data(10, 3)

I’m familiar with Stan, so I wrote a Stan program which correctly recovers the mean and sd as values close to 10 and 3:

data {
	int<lower=0> num_data;
	vector[num_data] x;
}

parameters {
	real mu0;
	real<lower=0> sd0;
}

model {
	mu0 ~ normal(0.0, 1.0);
	sd0 ~ gamma(1.0, 1.0);
	x ~ normal(mu0, sd0);
}

This is the Pyro program I wrote to mimic what the Stan program does.

def model(data):
	mu0 = pyro.sample("latent_mu0", dist.Normal(0.0, 1.0))
	sd0 = pyro.sample("latent_sd0", dist.Gamma(1.0, 1.0))

	with pyro.plate("observed_data"):
		pyro.sample("obs", dist.Normal(mu0, sd0), obs=data)

def guide(data):
	mu0_mu_q = pyro.param("mu0_mu_q", torch.tensor(0.0), constraint=constraints.real)
	mu0_sd_q = pyro.param("mu0_sd_q", torch.tensor(1.0), constraint=constraints.positive)
	sd0_alpha_q = pyro.param("sd0_alpha_q", torch.tensor(1.0), constraint=constraints.positive)
	sd0_beta_q = pyro.param("sd0_beta_q", torch.tensor(1.0), constraint=constraints.positive)

	pyro.sample("latent_mu0", dist.Normal(mu0_mu_q, mu0_sd_q))
	pyro.sample("latent_sd0", dist.Gamma(sd0_alpha_q, sd0_beta_q))

def train(n_steps, data):
	pyro.clear_param_store()

	adam_params = {"lr": 0.0005, "betas": (0.90, 0.999)}
	optimizer = Adam(adam_params)
	svi = SVI(model, guide, optimizer, loss=Trace_ELBO())

	for _ in range(n_steps):
		svi.step(data)

No matter how much data I give the model, it always returns parameters that are basically the same as their initialization values in the guide. What am I doing wrong? What is the best way to write the guide function in this situation?

A quick note - I’m not a data scientist or statistician (I’m a game designer) and have a tenuous understanding of this stuff. I apologize if this is overly basic. Thanks for your help!

jeffmax · December 20, 2018, 5:51pm

A couple of thoughts:

The prior you are putting on mu (and sd) makes the true mean very unlikely. If I run your code using MCMC (NUTS) in Pyro, I am able to recover close to 10 for the mean:

torch.manual_seed(99999)
nuts_kernel = NUTS(model, adapt_step_size=True)
mcmc_run = MCMC(nuts_kernel, num_samples=1000, warmup_steps=300, num_chains=1).run(data)
posterior = pyro.infer.EmpiricalMarginal(mcmc_run, 'latent_mu0')
posterior.mean

My suspicion is the SVI is having trouble exploring the space. You can get a sense of the trouble by plotting the loss, which jumps around a lot and has spikes.

In a jupyter notebook:

%matplotlib inline
import matplotlib.pyplot as plt

Add the following to your train function:

for _ in range(n_steps):
	losses.append(svi.step(data))
plt.plot(losses)
plt.title("ELBO")
plt.xlabel("step")
plt.ylabel("loss")

I think you want to find something that looks more convergent (see the image in the Pyro docs. I am a beginner as well, others may have better advice.

jeffmax · December 20, 2018, 6:47pm

As I thinkabout this more, I realized that perhaps the learning rate in train function was too small to get as far away from the prior as you need to. If I make lr=0.1, for example, things look much better.

 pyro.param('mu0_mu_q')
 tensor(9.7503)

wmiller · December 20, 2018, 7:04pm

Yes! The learning rate was my problem. with lr=0.2 and 6000 steps I was able to get the correct values. Thank you! Is there documentation (or rules of thumb) about learning rate and step count?

I tried the MCMC approach you mentioned, but ran into a crash right as warmup started. Here’s a snippet:

self = reduction.pickle.load(from_parent)  File "C:\Users\wmiller\AppData\Local\Continuum\miniconda3\envs\mlenv\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
self = reduction.pickle.load(from_parent)
AttributeError
AttributeError: self = reduction.pickle.load(from_parent)AttributeError: Can't get attribute 'model' on <module '__main__' (built-in)>
: Can't get attribute 'model' on <module '__main__' (built-in)>
AttributeErrorCan't get attribute 'model' on <module '__main__' (built-in)>

jeffmax · December 20, 2018, 7:45pm

I don’t know about rules of thumb, perhaps others can chime in. I’ve typically seen 0.001 used, but I imagine it is dependent on the problem and the optimizer. In this case, my advice would be to use a more realistic prior. For example, if the true mean of the dataset is 10, a Normal prior with mean 0 and standard deviation 1 assigns almost 0 probability to a mean of 10. Stan / MCMC seemed to be able to overcome this with enough data, but SVI takes a different approach than MCMC (it treats it as an optimization problem) and it seems as if you needed to tell it to take larger steps for it to get near the true value given where it started from.

I am not sure about the stack trace- are you using Pyro 0.3?

fehiepsi · December 20, 2018, 10:36pm

A good learning rate depends on magnitude of your parameter space. For neural network, weights are small so lr=0.001 makes senses. I usually observe loss from a plot as in @jeffmax comment, then tune lr and num_steps based on the observation.

wmiller · December 21, 2018, 5:33pm

Awesome! Thank you.

zayaan · November 2, 2021, 1:42pm

Did you find the error? I am also getting the same issue. No matter what data I am using, I am getting the same parameters as result that I initialise in my guide.