Hello - I’m quite new to pyro/numpyro and have read through most of the numpyro port of Statistical Rethinking (GitHub - fehiepsi/rethinking-numpyro: Statistical Rethinking (2nd ed.) with NumPyro), and read through all the introductory tutorials for Pyro (more than once for the SVI tutorials) and numpyro.
I’m trying to simply redo a simple linear regression model implemented by a colleague of mine using numpyro to see if I can recover similar coefficient to what they got using statsmodel.
I define a linear model as:
def linear_regression(X, y=None):
a = numpyro.sample("a", numpyro.distributions.Normal(0, 100))
n_scaled_features = X.shape[1] - 1
with numpyro.plate("bs", size=n_scaled_features):
b = numpyro.sample("b", numpyro.distributions.Normal(0., 800.))
b2 = numpyro.sample("b2", numpyro.distributions.Normal(0., 1.))
mu = a + b2 * X[:, -1] + jnp.sum(X[:, :-1] * b, axis=1)
sigma = numpyro.sample("sigma", numpyro.distributions.HalfNormal(5))
with numpyro.plate("data", size=X.shape[0]):
numpyro.sample("obs", numpyro.distributions.Normal(mu, sigma), obs=y)
When I run this model using:
num_warmup, num_samples = 1000, 1000
mcmc = MCMC(NUTS(model=linear_regression), num_warmup, num_samples)
mcmc.run(random.PRNGKey(2), X, y)
I get essentially the same coefficients as those returned by the statsmodel linear regression. However when I try to fit the model using SVI, first by following the format shown in the tutorial: Bayesian Regression - Inference Algorithms (Part 2) — Pyro Tutorials 1.8.4 documentation,
def manual_guide(X, y=None):
n_scaled_features = X.shape[1] - 1
a_loc = numpyro.param("a_loc", jnp.zeros(1))
a_scale = numpyro.param("a_scale", 100. * jnp.ones(1),
constraint=numpyro.distributions.constraints.positive)
a = numpyro.sample("a", numpyro.distributions.Normal(loc=a_loc, scale=a_scale))
b_loc = numpyro.param("b_loc", jnp.zeros(n_scaled_features))
b_scale = numpyro.param("b_scale", 1000 * jnp.ones(n_scaled_features))
with numpyro.plate("bs", size=n_scaled_features):
b = numpyro.sample("b", numpyro.distributions.Normal(loc=b_loc, scale=b_scale))
b2_loc = numpyro.param("b2_loc", jnp.zeros(1))
b2_scale = numpyro.param("b2_scale", 1. * jnp.ones(1))
b2 = numpyro.sample("b2", numpyro.distributions.Normal(loc=b2_loc, scale=b2_scale))
sigma_scale = numpyro.param("sigma_scale", 5. * jnp.ones(1))
sigma = numpyro.sample("sigma", numpyro.distributions.HalfNormal(sigma_scale))
I get totally different results. When I plot the loss it doesn’t seem to go down. I also find that when I try SVI with the AutoNormal
autoguide, the loss goes down by the coefficeints are totally different.
I am wondering if this has to do with the data itself, which has features with different scales, or if it has to do with how I am initializing the guide params?
A full worked example of this can be found here:
Thanks for the cool framework and any hints of what I’m doing wrong would be greatly appreciated!