Custom guide for learning a 1D Gaussian distribution

psls · November 29, 2023, 12:41pm

Hey all,

I am new to Pyro, and I am somewhat confused why this code does not work, i.e. what am I doing wrong in the guide function. The ELBO looks very volatile, and the resulting estimates for the distribution of sigma are off.

import torch
import matplotlib.pyplot as plt

import pyro
from torch.distributions import constraints
import pyro.distributions as dist
import pyro.optim as optim
from pyro.infer import SVI, Trace_ELBO

pyro.set_rng_seed(1)

true_mu = torch.Tensor([2])
true_sigma = torch.Tensor([1])

# generate data
data = pyro.distributions.Normal(true_mu, true_sigma).sample(torch.Size([20])).squeeze()

# specification of the data generating process
def model(data):
    mu = pyro.sample("mu", dist.Normal(torch.zeros(1), torch.ones(1)))
    sigma = pyro.sample("sigma", dist.LogNormal(torch.zeros(1), torch.ones(1)))

    with pyro.plate("data", len(data)):
        pyro.sample("obs", dist.Normal(mu, sigma), obs=data)

def guide(data):
    mu_loc = pyro.param('mu_loc',torch.tensor(0.))
    mu_scale = pyro.param('mu_scale', torch.tensor(1.), constraint=constraints.positive)
    sigma_loc = pyro.param('sigma_loc', torch.tensor(0.))
    sigma_scale = pyro.param('sigma_scale', torch.tensor(1.), constraint=constraints.positive)

    mu = pyro.sample("mu", dist.Normal(mu_loc, mu_scale))
    sigma = pyro.sample("sigma", dist.LogNormal(sigma_loc, sigma_scale))
    return {'mu': mu, 'sigma': sigma}

svi = SVI(model, 
          guide, 
          optim.Adam({"lr": .001}), 
          loss=Trace_ELBO())

pyro.clear_param_store()
num_iters = 1000
# store the elbo in list and plot later
elbo_list = []
for i in range(num_iters):
    elbo = svi.step(data)
    elbo_list.append(elbo)

plt.plot(elbo_list)
plt.xlabel("step")
plt.ylabel("ELBO")
plt.show()

print(pyro.param("mu_loc"), pyro.param("sigma_loc"))

martinjankowiak · November 29, 2023, 1:55pm

the variances of variational distributions should generally be initialized to be narrow; e.g. here you might choose torch.tensor(0.01)

psls · November 29, 2023, 2:59pm

Thanks a lot @martinjankowiak! Now it is running smoothly

Why do the variances for variational distributions have to be so narrow? Where can I learn more about how to choose variational distributions?

martinjankowiak · November 30, 2023, 3:57am

i don’t know where you’d find a generic discussion but you can read our SVI tutorials, e.g. part iv

basically: you’re computing the ELBO with a single stochastic sample. imagine the variance is very very large. the ELBO variance will necessarily be very high too. as will the gradient variance. learning with high variance gradients doesn’t work very well.