Custom guide for learning a 1D Gaussian distribution

Hey all,

I am new to Pyro, and I am somewhat confused why this code does not work, i.e. what am I doing wrong in the guide function. The ELBO looks very volatile, and the resulting estimates for the distribution of sigma are off.

import torch
import matplotlib.pyplot as plt

import pyro
from torch.distributions import constraints
import pyro.distributions as dist
import pyro.optim as optim
from pyro.infer import SVI, Trace_ELBO


true_mu = torch.Tensor([2])
true_sigma = torch.Tensor([1])

# generate data
data = pyro.distributions.Normal(true_mu, true_sigma).sample(torch.Size([20])).squeeze()

# specification of the data generating process
def model(data):
    mu = pyro.sample("mu", dist.Normal(torch.zeros(1), torch.ones(1)))
    sigma = pyro.sample("sigma", dist.LogNormal(torch.zeros(1), torch.ones(1)))

    with pyro.plate("data", len(data)):
        pyro.sample("obs", dist.Normal(mu, sigma), obs=data)

def guide(data):
    mu_loc = pyro.param('mu_loc',torch.tensor(0.))
    mu_scale = pyro.param('mu_scale', torch.tensor(1.), constraint=constraints.positive)
    sigma_loc = pyro.param('sigma_loc', torch.tensor(0.))
    sigma_scale = pyro.param('sigma_scale', torch.tensor(1.), constraint=constraints.positive)

    mu = pyro.sample("mu", dist.Normal(mu_loc, mu_scale))
    sigma = pyro.sample("sigma", dist.LogNormal(sigma_loc, sigma_scale))
    return {'mu': mu, 'sigma': sigma}

svi = SVI(model, 
          optim.Adam({"lr": .001}), 

num_iters = 1000
# store the elbo in list and plot later
elbo_list = []
for i in range(num_iters):
    elbo = svi.step(data)


print(pyro.param("mu_loc"), pyro.param("sigma_loc"))

the variances of variational distributions should generally be initialized to be narrow; e.g. here you might choose torch.tensor(0.01)

1 Like

Thanks a lot @martinjankowiak! Now it is running smoothly :slight_smile:

Why do the variances for variational distributions have to be so narrow? Where can I learn more about how to choose variational distributions?

i don’t know where you’d find a generic discussion but you can read our SVI tutorials, e.g. part iv

basically: you’re computing the ELBO with a single stochastic sample. imagine the variance is very very large. the ELBO variance will necessarily be very high too. as will the gradient variance. learning with high variance gradients doesn’t work very well.

1 Like