Pyro curve fitting parameter std estimate

I was trying out curve fitting using Pyro (instead of using Least Squares methods with χ2 loss functions). I’ve included a minimum working example fitting a sine curve to simulated noise, which works fairly well.

The std estimates on the parameter however are smaller than those of curve_fit or other methods, and I’m not sure why. The mean values are comparable. I might be doing something wrong in my definition of the model and guide.

Additionally I’ve noticed by inspecting the parameter store that the σ parameters are negative, even though I put positive constraints on them. I’m very new to pyro so it’s very possible that I’m missing something here.

Finally I set sigma, the parameter that defines the noise std on top of the data as a pyro parameter. Is there some way to extract what optimal values SVI found for this?

import math

from tqdm import tqdm

import torch
import pyro
import pyro.optim as optim
import pyro.distributions as dist
from torch.distributions import constraints
from pyro.infer import SVI, Trace_ELBO, Predictive

import numpy as np
import pandas as pd
from scipy.signal import savgol_filter
from scipy.optimize import curve_fit

np.random.seed(1)

def summary(samples):
    site_stats = {}
    for site_name, values in samples.items():
        marginal_site = pd.DataFrame(values)
        describe = marginal_site.describe(
            percentiles=[.05, 0.25, 0.5, 0.75, 0.95]
        ).transpose()
        site_stats[site_name] = describe[
            ["mean", "std", "5%", "25%", "50%", "75%", "95%"]
        ]
    return site_stats

def fn(x, ω, ϕ, b):
    return np.sin(ω*x + ϕ) + b

σ_noise = 1/7
ω = 2*np.pi*2.2
ϕ = np.random.rand()*2*np.pi
b = np.random.randn()
x = np.linspace(0,2,101)
y = fn(x, ω, ϕ, b)
y += np.random.randn(y.size) * σ_noise

dat = savgol_filter(y-np.mean(y), window_length = 11, polyorder = 3)
ω_guess = (
    2*math.pi * len(np.where(np.sign(dat)[1:] != np.sign(dat)[:-1])[0]) 
    / (2 * (x.max() - x.min()))
)
b_guess = np.mean(y)


bounds = ([0.8*ω_guess,0,-10], [1.2*ω_guess, 2*math.pi, 10])
popt, pcov = curve_fit(
    fn,x,y,p0 = [ω_guess, 1., b_guess], 
    bounds = bounds, sigma = np.ones(len(x))*σ_noise
)

def model(x,y,guess):
    ωμ = pyro.param("ωμ", torch.tensor(guess[0]), constraint = constraints.positive)
    ωσ = pyro.param("ωσ", torch.tensor(0.2), constraint = constraints.positive) 
    ω = pyro.sample("ω", dist.Normal(ωμ, ωσ))
    
    ϕμ = pyro.param(
        "ϕμ", torch.tensor(guess[1]), 
        constraint = constraints.interval(0, 2*math.pi)
    )
    ϕσ = pyro.param("ϕσ", torch.tensor(1.), constraint = constraints.positive)
    ϕ = pyro.sample("ϕ", dist.Normal(ϕμ,ϕσ))

    bμ = pyro.param(
        "bμ", torch.tensor(guess[2]), 
        constraint = constraints.interval(-10, 10)
    )
    bσ = pyro.param("bσ", torch.tensor(0.2), constraint = constraints.positive)
    b = pyro.sample("b", dist.Normal(bμ, bσ))
    
    values = torch.sin(ω*x+ϕ)+b
    
    sigma = pyro.param(
        "sigma", torch.tensor(1.), 
        constraint = constraints.positive
    )
    
    with pyro.plate("data", len(y)):
        pyro.sample("obs", dist.Normal(0, sigma), obs = values-y)

def guide(x,y,guess):
    ωμ = pyro.param("ωμ", torch.tensor(guess[0]), constraint = constraints.positive)
    ωσ = pyro.param("ωσ", torch.tensor(0.2), constraint = constraints.positive)
    ω = pyro.sample("ω", dist.Normal(ωμ, ωσ))
    
    ϕμ = pyro.param(
        "ϕμ", torch.tensor(guess[1]), 
        constraint = constraints.interval(0, 2*math.pi)
    )
    ϕσ = pyro.param("ϕσ", torch.tensor(1.), constraint = constraints.positive)
    ϕ = pyro.sample("ϕ", dist.Normal(ϕμ,ϕσ))

    bμ = pyro.param(
        "bμ", torch.tensor(guess[2]), 
        constraint = constraints.interval(-10, 10)
    )
    bσ = pyro.param("bσ", torch.tensor(0.2), constraint = constraints.positive)
    b = pyro.sample("b", dist.Normal(bμ,bσ))
    
    sigma = pyro.param(
        "sigma", torch.tensor(1.0), 
        constraint = constraints.positive
    )

svi = SVI(model,
          guide,
          optim.Adam({"lr": .05}),
          loss=Trace_ELBO()
)

pyro.clear_param_store()
num_iters = 5_000

for i in tqdm(range(num_iters)):
    elbo = svi.step(torch.tensor(x), torch.tensor(y), popt)
    if i % 500 == 0:
        print("Elbo loss: {}".format(elbo))

pstore = pyro.get_param_store()
for s,v in pstore.named_parameters():
    print(s,v)
        
from pyro.infer import Predictive

num_samples = 1_000
predictive = Predictive(model, guide=guide, num_samples=num_samples)
svi_samples = {k: v.reshape(num_samples).detach().cpu().numpy()
               for k, v in predictive(
                   torch.tensor(x), torch.tensor(y), popt
               ).items()
               if k != "obs"}

for s, v in summary(svi_samples).items():
    print(s, v)

Hi @ogras ,

That’s most likely because in the model your ωμ, ωσ, ϕμ, ϕσ, bμ, and bσ are pyro.param s instead of being just tensors/fixed values. You would want your priors to be fixed in order to do Bayesian inference.

def model(x,y,guess):
    ωμ = torch.tensor(...)  # set to whatever value you want
    ωσ = torch.tensor(0.2)  # set to whatever value you want 
    ω = pyro.sample("ω", dist.Normal(ωμ, ωσ))
    # similar for other sample sites below

That’s because parameters are associated with both constrained and unconstrained values as explained here. If you just call pyro.param(name) that should give a constrained value. .named_parameters() on the other hand outputs unconstrained values.

Unlike pyro.sample statements where each pyro.sample statement in the model needs to have a matching pyro.sample statement in the guide (except for observed sites), pyro.param statements in general do not match between model and guide. pyro.param statements used in the guide are variational parameters and are used to approximate the posterior distributions and shouldn’t be matched in the model. pyro.param statements in the model are used for point estimates of variables and shouldn’t be matched in the guide. In your case, parameters mentioned in the previous reply are variational parameters in the guide and need to be replaced with fixed values in the model. pyro.param("sigma", ...), on the other hand, is in the model used for a point estimate of it and therefore can be removed from the guide. Hope this helps to explain it!

1 Like

Thanks! That makes sense. I based my code on an example I found, and found it strange since indeed I didn’t see the parameter definitions in the tutorial model functions.

In the new model function I also added sigma as a sample (with corresponding parameters in the guide function such that it would also do Bayesian regression on those.

Then the stds on parameters using SVI are still smaller, but based on my current understanding SVI doesn’t sample the true posterior, just approximates it, and perhaps that’s the issue here. MCMC with NUTS matches just with the result from Least-Squares.

If I don’t set sigma as a distribution (i.e. fix the modelled noise on the data), MCMC fails completely to find a good solution.

def model(x,y,guess):
    ωμ = torch.tensor(guess[0])
    ωσ = torch.tensor(0.2)
    ω = pyro.sample("ω", dist.Normal(ωμ, ωσ))
    
    ϕμ = torch.tensor(guess[1])
    ϕσ = torch.tensor(1.)
    ϕ = pyro.sample("ϕ", dist.Normal(ϕμ,ϕσ))

    bμ = torch.tensor(guess[2])
    bσ = torch.tensor(0.2)
    b = pyro.sample("b", dist.Normal(bμ, bσ))
    
    values = torch.sin(ω*x+ϕ)+b
    
    sigma_μ = torch.tensor(1.)
    sigma_σ = torch.tensor(0.1)
    sigma = pyro.sample("sigma", dist.Normal(sigma_μ, sigma_σ))
    
    with pyro.plate("data", len(y)):
        pyro.sample("obs", dist.Normal(0, sigma), obs = values-y)