Unexpected `pyro.param` values after SVI

Hi,
I’ve been following the intro and the first part on SVI, with the purpose of estimating the input parameters of the model, depending on the data.

First, I start by defining the known model to sample data from:

def known_model():
    mu = pyro.sample("mu", tdist.Normal(2, 10))
    sigma = pyro.sample("sigma", tdist.HalfNormal(10))
    return tdist.Normal(mu, sigma)

m = known_model()
data = m.sample([10000])

# m
# Normal(loc: -15.014768600463867, scale: 4.559845924377441)

Then I define the model which I want to estimate the parameters (replaced by 0 and 1 in the mu sample) and also defining a guide function:

def model(data):
    mu = pyro.sample("mu", tdist.Normal(
                                                    torch.zeros(1),  # I know it's 2
                                                    torch.ones(1)    # I know it's 10
    ))
    sigma = pyro.sample("sigma", tdist.HalfNormal(10))

    with pyro.plate("observe_data"):
        pyro.sample("obs", tdist.Normal(mu, sigma), obs=data)

def guide(data):
    a = pyro.param("a", torch.rand(1))
    b = pyro.param("b", torch.rand(1), constraint=constraints.positive)
    
    return pyro.sample("mu", tdist.Normal(a, b))

Finally, I run the process:

pyro.clear_param_store()

num_steps = 5000
initial_lr = .75
gamma = 0.25  # final learning rate will be gamma * initial_lr
lrd = gamma ** (1 / num_steps)

svi = pyro.infer.SVI(model=model,
                     guide=guide,
                     optim=pyro.optim.ClippedAdam({'lr': initial_lr, 'lrd': lrd}),
                     loss=pyro.infer.Trace_ELBO())


losses, a, b = [], [], []
for t in range(num_steps):    
    losses.append(svi.step(data))
    a.append(pyro.param("a").item())
    b.append(pyro.param("b").item())
    if t % 100 == 0:
        print('.', end='')
        
print()

print('a = ',pyro.param("a").item())
print('b = ', pyro.param("b").item())

obtaining the results:

a =  -15.18841552734375
b =  0.03032298944890499

Shouldn’t I get something close to 2 and 10, respectively? If not, how can I estimate the initial parameters of such a model? In a similar question, he suggested to use pyro.param to mark them as free model parameters. I may have misunderstood something.

Thanks

your model has two latent variables (mu and sigma). these both need to appear in the guide. something like

c = pyro.param("c", torch.rand(1))
d = pyro.param("d", torch.rand(1), constraint=constraints.positive)
pyro.sample("sigma", tdist.LogNormal(c, d))

you were basically keeping the sigma distribution at the prior distribution which meant that you weren’t learning the posterior

Despite including sigma in the guide, the results do not vary and they do not get close to the original values

can you please include a complete runnable script with imports etc

Here you have:

import pyro
import torch
import torch.distributions.constraints as constraints
from pyro import distributions as tdist
from pyro.infer import MCMC, NUTS


def known_model():
    mu = pyro.sample("mu", tdist.Normal(2, 10))
    sigma = pyro.sample("sigma", tdist.HalfNormal(10))
    return tdist.Normal(mu, sigma)


def model(data):
    mu = pyro.sample("mu", tdist.Normal(torch.zeros(1), torch.ones(1)))
    sigma = pyro.sample("sigma", tdist.HalfNormal(torch.ones(1)))

    with pyro.plate("observe_data"):
        pyro.sample("obs", tdist.Normal(mu, sigma), obs=data)


def guide(data):
    a = pyro.param("a", torch.rand(1))
    b = pyro.param("b", torch.rand(1), constraint=constraints.positive)
    pyro.sample("mu", tdist.Normal(a, b))

    d = pyro.param("d", torch.rand(1), constraint=constraints.positive)
    pyro.sample("sigma", tdist.HalfNormal(d))


m = known_model()
data = m.sample([10000])

pyro.clear_param_store()

num_steps = 5000
initial_lr = .75
gamma = 0.25  # final learning rate will be gamma * initial_lr
lrd = gamma ** (1 / num_steps)

svi = pyro.infer.SVI(model=model,
                     guide=guide,
                     optim=pyro.optim.ClippedAdam({'lr': initial_lr, 'lrd': lrd}),
                     loss=pyro.infer.Trace_ELBO())


losses, a, b, d = [], [], [], []
for t in range(num_steps):    
    losses.append(svi.step(data))
    a.append(pyro.param("a").item())
    b.append(pyro.param("b").item())
    d.append(pyro.param("d").item())
    if t % 100 == 0:
        print('.', end='')
        
print()

print('a = ',pyro.param("a").item())
print('b = ', pyro.param("b").item())
print('d = ', pyro.param("d").item())

here’s a complete working script

import pyro
import torch
import torch.distributions.constraints as constraints
from pyro import distributions as tdist
from pyro.infer import MCMC, NUTS


def known_model(true_mu, true_sigma):
    return tdist.Normal(true_mu, true_sigma)


def model(data):
    mu = pyro.sample("mu", tdist.Normal(0.0, 1.0))
    sigma = pyro.sample("sigma", tdist.HalfNormal(1.0))

    with pyro.plate("data", len(data)):
        pyro.sample("obs", tdist.Normal(mu, sigma), obs=data)


def guide(data):
    a = pyro.param("a", torch.tensor(0.0))
    b = pyro.param("b", torch.tensor(0.01), constraint=constraints.positive)
    pyro.sample("mu", tdist.Normal(a, b))

    c = pyro.param("c", torch.tensor(0.0))
    d = pyro.param("d", torch.tensor(0.01), constraint=constraints.positive)
    pyro.sample("sigma", tdist.LogNormal(c, d))


true_mu = 1.1
true_sigma = 2.2

m = known_model(true_mu=true_mu, true_sigma=true_sigma)
data = m.sample([5000])
print("data", data.shape)

pyro.clear_param_store()

num_steps = 1000
initial_lr = 0.1
gamma = 0.1  # final learning rate will be gamma * initial_lr
lrd = gamma ** (1 / num_steps)

svi = pyro.infer.SVI(model=model,
                     guide=guide,
                     optim=pyro.optim.ClippedAdam({'lr': initial_lr, 'lrd': lrd}),
                     loss=pyro.infer.Trace_ELBO())


losses = []
for t in range(num_steps):
    losses.append(svi.step(data))
    if t % 100 == 0:
        print("[step %d] loss: %.3f" % (t, losses[-1]))

print("done with inference")
print("true mu", true_mu, "true_sigma", true_sigma)
print('mean inferred mu = ',pyro.param("a").item())
print('mean inferred sigma = ', tdist.LogNormal(pyro.param("c"), pyro.param("d")).mean.item())

note that large learning rates are generally a bad idea. for other optimization tricks see svi part iv

Thanks for the reply @martinjankowiak. I don’t know if I totally agree with your answer. Indeed, the code works and the inference correctly approximates the values of true_mu and true_sigma. However, what I want to know is how to obtain the initial values of the hierarchical model, from the Normal and Half-Normal distributions, whose values are (2, 10) and 10 respectively. If I understand correctly, your code infers the latent variables.

On the other hand, I don’t know if what I am referring to makes sense. For example, when I use MCMC to estimate, I get the values of the latent variables (in this case I would get samples for mu and sigma), but does it make sense to estimate the “optimal” values of the input parameters (2, 10) and 10? Or are they irrelevant?

Thanks again

what you want to do is basically impossible. your dataset corresponds to a single value of mu. in other words you measure mu once. yet you want to infer two parameters from a single number, a number which itself is inferred with uncertainty. to infer the top level parameters in any meaningful way you’d effectively need to measure multiple datasets and then do inference in an appropriate hierarchical model