 # Pyro curve fitting parameter std estimate

I was trying out curve fitting using Pyro (instead of using Least Squares methods with χ2 loss functions). I’ve included a minimum working example fitting a sine curve to simulated noise, which works fairly well.

The std estimates on the parameter however are smaller than those of `curve_fit` or other methods, and I’m not sure why. The mean values are comparable. I might be doing something wrong in my definition of the `model` and `guide`.

Additionally I’ve noticed by inspecting the parameter store that the `σ` parameters are negative, even though I put positive constraints on them. I’m very new to pyro so it’s very possible that I’m missing something here.

Finally I set `sigma`, the parameter that defines the `noise` std on top of the data as a pyro parameter. Is there some way to extract what optimal values SVI found for this?

``````import math

from tqdm import tqdm

import torch
import pyro
import pyro.optim as optim
import pyro.distributions as dist
from torch.distributions import constraints
from pyro.infer import SVI, Trace_ELBO, Predictive

import numpy as np
import pandas as pd
from scipy.signal import savgol_filter
from scipy.optimize import curve_fit

np.random.seed(1)

def summary(samples):
site_stats = {}
for site_name, values in samples.items():
marginal_site = pd.DataFrame(values)
describe = marginal_site.describe(
percentiles=[.05, 0.25, 0.5, 0.75, 0.95]
).transpose()
site_stats[site_name] = describe[
["mean", "std", "5%", "25%", "50%", "75%", "95%"]
]
return site_stats

def fn(x, ω, ϕ, b):
return np.sin(ω*x + ϕ) + b

σ_noise = 1/7
ω = 2*np.pi*2.2
ϕ = np.random.rand()*2*np.pi
b = np.random.randn()
x = np.linspace(0,2,101)
y = fn(x, ω, ϕ, b)
y += np.random.randn(y.size) * σ_noise

dat = savgol_filter(y-np.mean(y), window_length = 11, polyorder = 3)
ω_guess = (
2*math.pi * len(np.where(np.sign(dat)[1:] != np.sign(dat)[:-1]))
/ (2 * (x.max() - x.min()))
)
b_guess = np.mean(y)

bounds = ([0.8*ω_guess,0,-10], [1.2*ω_guess, 2*math.pi, 10])
popt, pcov = curve_fit(
fn,x,y,p0 = [ω_guess, 1., b_guess],
bounds = bounds, sigma = np.ones(len(x))*σ_noise
)

def model(x,y,guess):
ωμ = pyro.param("ωμ", torch.tensor(guess), constraint = constraints.positive)
ωσ = pyro.param("ωσ", torch.tensor(0.2), constraint = constraints.positive)
ω = pyro.sample("ω", dist.Normal(ωμ, ωσ))

ϕμ = pyro.param(
"ϕμ", torch.tensor(guess),
constraint = constraints.interval(0, 2*math.pi)
)
ϕσ = pyro.param("ϕσ", torch.tensor(1.), constraint = constraints.positive)
ϕ = pyro.sample("ϕ", dist.Normal(ϕμ,ϕσ))

bμ = pyro.param(
"bμ", torch.tensor(guess),
constraint = constraints.interval(-10, 10)
)
bσ = pyro.param("bσ", torch.tensor(0.2), constraint = constraints.positive)
b = pyro.sample("b", dist.Normal(bμ, bσ))

values = torch.sin(ω*x+ϕ)+b

sigma = pyro.param(
"sigma", torch.tensor(1.),
constraint = constraints.positive
)

with pyro.plate("data", len(y)):
pyro.sample("obs", dist.Normal(0, sigma), obs = values-y)

def guide(x,y,guess):
ωμ = pyro.param("ωμ", torch.tensor(guess), constraint = constraints.positive)
ωσ = pyro.param("ωσ", torch.tensor(0.2), constraint = constraints.positive)
ω = pyro.sample("ω", dist.Normal(ωμ, ωσ))

ϕμ = pyro.param(
"ϕμ", torch.tensor(guess),
constraint = constraints.interval(0, 2*math.pi)
)
ϕσ = pyro.param("ϕσ", torch.tensor(1.), constraint = constraints.positive)
ϕ = pyro.sample("ϕ", dist.Normal(ϕμ,ϕσ))

bμ = pyro.param(
"bμ", torch.tensor(guess),
constraint = constraints.interval(-10, 10)
)
bσ = pyro.param("bσ", torch.tensor(0.2), constraint = constraints.positive)
b = pyro.sample("b", dist.Normal(bμ,bσ))

sigma = pyro.param(
"sigma", torch.tensor(1.0),
constraint = constraints.positive
)

svi = SVI(model,
guide,
loss=Trace_ELBO()
)

pyro.clear_param_store()
num_iters = 5_000

for i in tqdm(range(num_iters)):
elbo = svi.step(torch.tensor(x), torch.tensor(y), popt)
if i % 500 == 0:
print("Elbo loss: {}".format(elbo))

pstore = pyro.get_param_store()
for s,v in pstore.named_parameters():
print(s,v)

from pyro.infer import Predictive

num_samples = 1_000
predictive = Predictive(model, guide=guide, num_samples=num_samples)
svi_samples = {k: v.reshape(num_samples).detach().cpu().numpy()
for k, v in predictive(
torch.tensor(x), torch.tensor(y), popt
).items()
if k != "obs"}

for s, v in summary(svi_samples).items():
print(s, v)
``````

Hi @ogras ,

That’s most likely because in the model your ωμ, ωσ, ϕμ, ϕσ, bμ, and bσ are `pyro.param` s instead of being just tensors/fixed values. You would want your priors to be fixed in order to do Bayesian inference.

``````def model(x,y,guess):
ωμ = torch.tensor(...)  # set to whatever value you want
ωσ = torch.tensor(0.2)  # set to whatever value you want
ω = pyro.sample("ω", dist.Normal(ωμ, ωσ))
# similar for other sample sites below
``````

That’s because parameters are associated with both constrained and unconstrained values as explained here. If you just call `pyro.param(name)` that should give a constrained value. `.named_parameters()` on the other hand outputs unconstrained values.

Unlike `pyro.sample` statements where each `pyro.sample` statement in the model needs to have a matching `pyro.sample` statement in the guide (except for observed sites), `pyro.param` statements in general do not match between model and guide. `pyro.param` statements used in the guide are variational parameters and are used to approximate the posterior distributions and shouldn’t be matched in the model. `pyro.param` statements in the model are used for point estimates of variables and shouldn’t be matched in the guide. In your case, parameters mentioned in the previous reply are variational parameters in the guide and need to be replaced with fixed values in the model. `pyro.param("sigma", ...)`, on the other hand, is in the model used for a point estimate of it and therefore can be removed from the guide. Hope this helps to explain it!

1 Like

Thanks! That makes sense. I based my code on an example I found, and found it strange since indeed I didn’t see the parameter definitions in the tutorial `model` functions.

In the new `model` function I also added `sigma` as a `sample` (with corresponding parameters in the `guide` function such that it would also do Bayesian regression on those.

Then the stds on parameters using `SVI` are still smaller, but based on my current understanding `SVI` doesn’t sample the true posterior, just approximates it, and perhaps that’s the issue here. `MCMC` with `NUTS` matches just with the result from `Least-Squares`.

If I don’t set `sigma` as a distribution (i.e. fix the modelled noise on the data), `MCMC` fails completely to find a good solution.

``````def model(x,y,guess):
ωμ = torch.tensor(guess)
ωσ = torch.tensor(0.2)
ω = pyro.sample("ω", dist.Normal(ωμ, ωσ))

ϕμ = torch.tensor(guess)
ϕσ = torch.tensor(1.)
ϕ = pyro.sample("ϕ", dist.Normal(ϕμ,ϕσ))

bμ = torch.tensor(guess)
bσ = torch.tensor(0.2)
b = pyro.sample("b", dist.Normal(bμ, bσ))

values = torch.sin(ω*x+ϕ)+b

sigma_μ = torch.tensor(1.)
sigma_σ = torch.tensor(0.1)
sigma = pyro.sample("sigma", dist.Normal(sigma_μ, sigma_σ))

with pyro.plate("data", len(y)):
pyro.sample("obs", dist.Normal(0, sigma), obs = values-y)
``````