Error in Variables with pyro

Hi

I would like to generate a simple “error in (feature) variable model”. This kind of models is required when the x and y values are subject to errors (not only the y values like in typical mean squared error minimization).

On the long run this model shall be non-linear, e.g. a Bayesian neural network.

However, I start out with a simple toy test-case where I use a simple linear Regression (which has a well known optimal solution).


In Maximum-Likelihood estimation the optimal linear model is described by an adapted loss function, see Deming Regression: Deming regression - Wikipedia

image

Key here is that the observed (x_i,y_i) training pairs are described as having errors themselves: (x_i,y_i)=(x_i^*+\eta_i,y_i^*+\epsilon_i)
So there is “error-free” (x_i^*, y_i^*=y_\theta(x_i^*)) associated with each training example. While the function y_\theta(x_i^*) is fitted by finding the most likely parameters \theta, finding all the optimal measurement x_i^* needs to be added to the optimization problem.


I want to create a Bayesian version of Deming regression which can then be easily generalized to nonlinear models later.

Here is what I have so far:

image

The problem is, that the x_i*=x_i-error\_bias_i - or equivalently the (for each sample constant but arbitrary) error_bias is not correctly fitted (it just stays at it’s initialization value) when I use the NUTs MCMC or the SVI approach with autoguide = pyro.infer.autoguide.AutoDelta(model)

Therefore, the model (as well as ordinary least squares) is not able to reproduce the optimal (Deming regression) solution, which coincides with the ground truth of the generating process):
image

Could you please help me to get this work with pyro? I would already be happy with solving the linear case correctly.

Later I would want to

  • optimze a non-linear model like a neural network in a later stage
  • assume that the errors in x and y may be depending on x and may be heteroscedastic (i.e. we might have varying distributions of errors in both x and y)
    If you have any furhter hints for the generalization, I would also be very happy!

Below you find my code (unfortunately, I could not upload my jupyter notebook).

import matplotlib.pyplot as plt
import torch
import copy
x1 = torch.rand(50, 1) * 0.3 - 1
x2 = torch.rand(50, 1) * 0.5 + 0.5
x_no_error=torch.cat([x1,x2])
num=200
x_no_error = torch.linspace(-5,5, steps=num).unsqueeze(1)
gt_slope=12
gt_intercept=0
def f(x_no_error, noise=True):
    y=torch.cos(x_no_error*4+0.8)
    y=gt_slope*x_no_error+gt_intercept
    x=copy.deepcopy(x_no_error)
    if noise:
        y+= 5* torch.randn_like(x_no_error)
        x+=3 * torch.randn_like(x_no_error)
        
    return x,y
x,y=f(x_no_error)

x_test_no_error = torch.linspace(-10, 10, 401).unsqueeze(-1)
x_test,y_test=f(x_test_no_error)
plt.plot(x,y)

import pyro
pyro.enable_validation(True)
import pyro.distributions as dist

def prediction(offset, slope, x_mu_obs):
    y_pred = offset + slope * x_mu_obs
    return y_pred

def model(x, y):
    with pyro.plate("x_data", len(x)):
        error_bias=pyro.sample("error_bias", dist.Normal(0, 2)) #only initial guess???
    x_no_error_estimated=x-error_bias
    x_mu_obs =  pyro.sample('x_mu_obs', dist.Normal(x_no_error_estimated, 0.1), obs=x) 
    
    offset = pyro.sample('offset', dist.Normal(0, 10))
    slope = pyro.sample('slope', dist.Normal(0, 10))
    y_pred = prediction(offset, slope, x_no_error_estimated) 
    y_err_var = pyro.sample('y_err_var', dist.HalfNormal(5.))
    y_obs = pyro.sample('y_mu', dist.Normal(y_pred, y_err_var), obs=y)
    return x_mu_obs, y_obs

pyro.render_model(model, model_args=(x,y), render_params=True)

import torch
from pyro.infer import MCMC, NUTS

# Convert x and y to PyTorch tensors
x = torch.tensor(x)
y = torch.tensor(y)

# Run MCMC using NUTS
nuts_kernel = NUTS(model)
mcmc = MCMC(nuts_kernel, num_samples=201, warmup_steps=200, num_chains=1)
mcmc.run(x, y) #fits the model to the observations

# Get posterior samples
samples = mcmc.get_samples()



# Get posterior predictive samples
predictive = pyro.infer.Predictive(model, posterior_samples=samples)
result = predictive(torch.squeeze(x),torch.squeeze(y))

y_test_pred=prediction(samples["offset"].mean(), samples["slope"].mean(), x_test)


# Plot for control

import numpy as np
from scipy.odr import ODR, Model, Data

# Define the model
def line_model(p, x):
    a, b = p
    return a*x + b

# Create the ODR object
model_deming = Model(line_model)
data = Data(x[:,0], y[:,0])
odr = ODR(data, model_deming, beta0=[1, 1])

# Perform the orthogonal regression
result = odr.run()

# Print the result
print("Deming Intercept: ", result.beta[1])
print("Deming Slope: ", result.beta[0])
print("Standard deviation of the estimates: ", result.sd_beta)

from sklearn.linear_model import LinearRegression
ols_linreg_model = LinearRegression()
ols_linreg_model.fit(x, y[:,0]);

plt.plot(x,y, label="train data")
plt.plot(x_test,y_test_pred, label="Bayes MC test pred", color = "green") #XXX not learning deming
plt.plot(*f(x_no_error, noise=False), label="ground truth", color = "black")
plt.plot(x, ols_linreg_model.predict(x), label="OLS")
plt.plot(x_test,result.beta[1]+result.beta[0]*x_test, label="Deming", color="purple")
plt.legend()

PS: I also had a look at what others did e.g. in pymc: Errors-in-variables model in PyMC3 - Questions - PyMC Discourse However, I need to note that their model fails to reproduce the Deming regression solution as well!